how do you prompt this model
is the actual model uploaded and available for use? This adapter seems like it may be something different but it is hard to tell with the available documentation. I have been trying to get the prompting to work with an offline model for days. Do you have any guidance for this? I would simply like to prompt the model and get a response. I am using my institution's cluster computing service and I know that I have all of the available resources.
Happy to help with this if I can. Do you have an example prompt you are using?
Thank you so much! Since I wrote this is successfully downloaded the full model. The sample prompt i am using to test things is ""<< What decision would a person most likely make given the following situation: A person is deciding whether to save money or buy a new gadget they've wanted. >>". I will attach sample code as well, which seems to work fine (still waiting on resources, but the only issue I have now encountered is not requesting enough resources. Our institution has 40gb A100s, so I have requested 4 of them (instead of the two 80gb A100s the model says it requires), so hopefully that will be enough.
I am also curious about how to get the adapter working as it would be less computationally expensive and easier to get access to the required resources. The issue I encounter is that our GPU nodes do not allow connection to the internet while they are running. Am I correct in assuming that using the adapter requires my code to make requests over the internet and to the full model? Is there a way to get the adapter running offline without needing to load the adapter AND the full model into memory?
I am interested in slightly less computationally expensive approaches (Is the full 8b model available)?
Thank you so much for taking the time to read this and answer even some of the questions involved.
We have recently released a (somewhat still experimental) 8B version of the model: https://huggingface.co/marcelbinz/Llama-3.1-Centaur-8B-adapter
I rephrased your prompt a bit so that it is more similar to the experiments we used during finetuning, and created a simple colab notebook with the 8B model (runs on a free T4 instance): https://colab.research.google.com/drive/1JnMWUsFO5t2vMkGCdRyr9t2Acwvo1OBy?usp=sharing
Using the adapter doesn’t require Internet access but downloading of course does (like with any other model hosted on HuggingFace). If you don’t have Internet access on a compute node, you can download the model in advance using the HuggingFace CLI: https://huggingface.co/docs/huggingface_hub/en/guides/cli
Hope that helps!