---
base_model: meta-llama/Meta-Llama-3.1-8B
datasets:
- HuggingFaceH4/ultrachat_200k
- mathewhe/OpenHermes-2.5-Formatted
- princeton-nlp/gemma2-ultrafeedback-armorm
license: llama3.1
tags:
- text
---

# Llama-3.1-8B-Chat

`meta-llama/Meta-Llama-3.1-8B` fine-tuned for chat completions.

*Obligatory,* this model was `Built with Llama`.

## Quick start

Simply load the model and generate responses:
```python
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
)


model = AutoModelForCausalLM.from_pretrained("mathewhe/Llama-3.1-8B-Chat")
tokenizer = AutoTokenizer.from_pretrained("mathewhe/Llama-3.1-8B-Chat")

messages = [
    {"role": "user", "content": "What is an LLM?"},
]

inputs = tokenizer.apply_chat_template(messages)

print(tokenizer.decode(model.generate(**inputs)[0]))
```

Alternatively, copy the included `chat_class.py` module into your local
directory and just import the `Chat` class:
```python
from chat_class import Chat
chat = Chat(
    "mathewhe/Llama-3.1-8B-Chat",
    device="cuda",
)

# for one-off instructions
instruction = "Write an ingredient list for banana pudding."
print(chat.instruct(instruction))

# for multi-turn chat
response1 = chat.message("Hi, please explain what DNA is.")
response2 = chat.message("Tell me more about how its discovery affected society.")

# to reset the chat
chat.reset()
```

## Performance

We verified that this model was successfully aligned for both multi-turn
dialogue and one-off instruction following.

- Note that this model generates relatively short completions, leading to a low
  win-rate on [AlpacaEval](https://github.com/tatsu-lab/alpaca_eval) (due to
  the known length bias).
- But it achieves a
  [length-corrected win-rate](https://arxiv.org/abs/2404.04475)
  on-par with that of Meta's
  [8B instruction variant](https://huggingface.co/meta-llama/Meta-Lama-3.1-8B-Instruction)
  (which was trained on an unreleased dataset).

|      Model                            | AlpacaEval | AlpacaEval-LC |
|---------------------------------------|------------|---------------|
| meta-llama/Meta-Llama-3.1-8B-Instruct | 21.84      | **20.85**     |
| mathewhe/Llama-3.1-8B-Chat            | 12.16      | **20.53**     |

## Chat template

This model uses the following chat template and does not support a separate
system prompt:
```
<|begin_of_text|>[INST]<user-message>[/INST][ASST]<llm-response>[/ASST]<|end_of_text|>
```

The included tokenizer will correctly format messages, so you should not have
to manually format the input text.

Instead, use the tokenizer's `apply_chat_template()` method on a list of
messages.
Each message should be a dict with two keys:
- "role": Either "user" or "assistant".
- "content": The message to include.

For example:
```python
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("mathewhe/Llama-3.1-8B-Chat")

messages = [
    {"role": "user", "content": "Solve for x: 3x=4"},
    {"role": "assistant", "content": "3x=4\n(3x)/3=(4)/3\nx=4/3"},
    {"role": "user", "content": "Please explain your work."},
]
print(tokenizer.apply_chat_template(messages, tokenize=False)
```
outputs
```
<|begin_of_text|>[INST]Solve for x: 3x=4[/INST][ASST]3x=4
(3x)/3=(4)/3
x=4/3[/ASST]<|end_of_text|><|begin_of_text|>[INST]Please explain your work[/INST]
```

See the example code in the included `chat_class.py` module for more details.

## Data

This model was trained on the following three datsets:
- [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k)
- [mathewhe/OpenHermes-2.5-Formatted](https://huggingface.co/datasets/mathewhe/OpenHermes-2.5-Formatted)
  (`nosys` configuration)
- [princeton-nlp/gemma2-ultrafeedback-armorm](https://huggingface.co/datsets/princeton-nlp/gemma2-ultrafeedback-armorm)