--- base_model: meta-llama/Meta-Llama-3.1-8B datasets: - HuggingFaceH4/ultrachat_200k - mathewhe/OpenHermes-2.5-Formatted - princeton-nlp/gemma2-ultrafeedback-armorm license: llama3.1 tags: - text --- # Llama-3.1-8B-Chat `meta-llama/Meta-Llama-3.1-8B` fine-tuned for chat completions. *Obligatory,* this model was `Built with Llama`. ## Quick start Simply load the model and generate responses: ```python from transformers import ( AutoModelForCausalLM, AutoTokenizer, ) model = AutoModelForCausalLM.from_pretrained("mathewhe/Llama-3.1-8B-Chat") tokenizer = AutoTokenizer.from_pretrained("mathewhe/Llama-3.1-8B-Chat") messages = [ {"role": "user", "content": "What is an LLM?"}, ] inputs = tokenizer.apply_chat_template(messages) print(tokenizer.decode(model.generate(**inputs)[0])) ``` Alternatively, copy the included `chat_class.py` module into your local directory and just import the `Chat` class: ```python from chat_class import Chat chat = Chat( "mathewhe/Llama-3.1-8B-Chat", device="cuda", ) # for one-off instructions instruction = "Write an ingredient list for banana pudding." print(chat.instruct(instruction)) # for multi-turn chat response1 = chat.message("Hi, please explain what DNA is.") response2 = chat.message("Tell me more about how its discovery affected society.") # to reset the chat chat.reset() ``` ## Performance We verified that this model was successfully aligned for both multi-turn dialogue and one-off instruction following. - Note that this model generates relatively short completions, leading to a low win-rate on [AlpacaEval](https://github.com/tatsu-lab/alpaca_eval) (due to the known length bias). - But it achieves a [length-corrected win-rate](https://arxiv.org/abs/2404.04475) on-par with that of Meta's [8B instruction variant](https://huggingface.co/meta-llama/Meta-Lama-3.1-8B-Instruction) (which was trained on an unreleased dataset). | Model | AlpacaEval | AlpacaEval-LC | |---------------------------------------|------------|---------------| | meta-llama/Meta-Llama-3.1-8B-Instruct | 21.84 | **20.85** | | mathewhe/Llama-3.1-8B-Chat | 12.16 | **20.53** | ## Chat template This model uses the following chat template and does not support a separate system prompt: ``` <|begin_of_text|>[INST][/INST][ASST][/ASST]<|end_of_text|> ``` The included tokenizer will correctly format messages, so you should not have to manually format the input text. Instead, use the tokenizer's `apply_chat_template()` method on a list of messages. Each message should be a dict with two keys: - "role": Either "user" or "assistant". - "content": The message to include. For example: ```python from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("mathewhe/Llama-3.1-8B-Chat") messages = [ {"role": "user", "content": "Solve for x: 3x=4"}, {"role": "assistant", "content": "3x=4\n(3x)/3=(4)/3\nx=4/3"}, {"role": "user", "content": "Please explain your work."}, ] print(tokenizer.apply_chat_template(messages, tokenize=False) ``` outputs ``` <|begin_of_text|>[INST]Solve for x: 3x=4[/INST][ASST]3x=4 (3x)/3=(4)/3 x=4/3[/ASST]<|end_of_text|><|begin_of_text|>[INST]Please explain your work[/INST] ``` See the example code in the included `chat_class.py` module for more details. ## Data This model was trained on the following three datsets: - [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) - [mathewhe/OpenHermes-2.5-Formatted](https://huggingface.co/datasets/mathewhe/OpenHermes-2.5-Formatted) (`nosys` configuration) - [princeton-nlp/gemma2-ultrafeedback-armorm](https://huggingface.co/datsets/princeton-nlp/gemma2-ultrafeedback-armorm)