How to download and use the model.
Hello, does anyone have a snippet of python code on how to download and use the model? OR anything that shows you the procedures to use the model.
Hello, you can download the model with git LFS and then run it using the inference script in the github repo.
- Accept Llama2 license and download Llama2 weights
- Download the amharic finetune from this repository as shown here https://huggingface.co/docs/hub/models-downloading
- Clone the github repo and put your path to llama2 and the peft model into the inference script here: https://github.com/iocuydi/amharic-llama-llava/blob/main/inference/run_inf.py
What is the peft model?
This line doesn't seem to import inside the run_inf.py file:
from model_utils import load_model, load_peft_model
I can't find the model_utils file anywhere in the github repo
Added that file to the github repo.
Peft stands for "Parameter Efficient Fine Tuning." It allows large models to be finetuned more easily, more about it here: https://huggingface.co/blog/peft
With this and most llama finetunes, you'll load the original llama weights, and then a smaller set of Peft weights from the finetune.
Thank you for doing that. So I did the following as you described:
- Downloaded the llama-2-7b model using the download.sh script
- Downloaded this amharic model using git lfs from hugging face
- Cloned the github repository and put the path to the llama model in the run_inf.py file
Questions:
- Where do I use the amharic model I downloaded from here (step 2 above)
- What is the below path exactly
peft_model = '/path/to/checkpoint' - How do I change the Llama-2 tokenizer with the Llama-2-Amharic tokenizer.
Thank you.
Forgot to mention you need to convert llama2 to huggingface format as with this: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py
- The "main_path" param should point at the directory with the llama weights after they are converted to huggingface format.
- The peft model path is the path to the finetuned checkpoint. Without it loading a checkpoint, you're just using the original Llama2. This path should point to a directory containing the files downloaded from this hf repository (the fine tuned weights).
- Replace the tokenizer files that come with Llama2 with the tokenizer files from this repository.
Thank you!! Regarding the tokenizer files, would replacing only the tokenizer.model file work? I tried that and it does respond with Amharic. Though not sure if replacing the remaining files improve its output.
You should replace all the applicable tokenizer files with ours. A couple other tips for prompting:
-Try different system prompts (the initial instruction about being an Amharic assistant) but keep the system prompt in English
-Experiment with different hyper parameters depending on the task, higher top k/temperature can give more varied and creative answers but also more chance of hallucinations and wrong answers.
Thanks for the tips.
I was thinking of continuing the pre-training with more amharic data. Unfortunately, I wasn't really able to find good resources on how to do that. Can you please recommend some helpful resources to achieve that?
The scripts in the github repo can be used for pretraining and finetuning. Unless you have a massive amount of Amharic data (billions of tokens), doing additional pretraining likely will not help much, and finetuning would be a more effective strategy. You can also check out the Chinese Llama Alpaca paper/repo for more details, much of this work was based on that.
Alright, thanks a lot for your support!!
One more thing. So I tried to finetune the model on top of loading the gari model using peft. Then, when I try to run inference by loading both the gari peft and my finetuned peft one after another and try to ask a question, it no longer gives an answer it previously replied correctly. Like if I ask "what medicine should I take if I have a flu" it answers well on the gari peft, but outputs giberrish on the one that loads both the gari peft and the newer finetuned peft.
MAIN_PATH = '/model/Llama-2-7b-hf'
peft_model = '/model/llama-2-amharic-3784m'
#newer finetuned version on top of the garri model
peft_model2 = '/home/user/model/output'
model = load_model(model_name, quantization)
model = load_peft_model(model, peft_model)
model = load_peft_model(model, peft_model2)
Is the way I'm loading both peft models correct?
Only load one peft model. If you load another you're replacing the weights of the first one, they aren't meant to be mixed. In general you will load a single base llama model, and optionally a single peft model.
For your case, it sounds like you should follow these steps:
- load Llama2 with my peft model, then finetune
- After training, load Llama2 with your peft model, perform inference, additional finetuning, etc.
If your model isn't performing as expected, there may be an issue with your dataset or training process. One way to debug is to first try a very simple dataset of a couple thousand identical items (all the same training example) and see if you can get the model to overfit and get 0 loss on this and inference properly, before moving on to the actual dataset.