File size: 5,276 Bytes
a14b4fb 630a2ed a14b4fb f02a1dd a14b4fb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
---
license: apache-2.0
---
<h1 align="center"> Moxin Chat 7B </h1>
<p align="center"> <a href="https://github.com/moxin-org/Moxin-LLM">Home Page</a>    |    <a href="https://arxiv.org/abs/2412.06845">Technical Report</a>    |    <a href="https://huggingface.co/moxin-org/moxin-llm-7b">Base Model</a>    |    <a href="https://huggingface.co/moxin-org/moxin-chat-7b">Chat Model</a> </p>
## Model
You can download our base 7B model from this [link](https://huggingface.co/moxin-org/moxin-llm-7b) and our chat 7B model from this [link](https://huggingface.co/moxin-org/moxin-chat-7b).
## Inference
You can use the following code to run inference with the model. The model is saved under './model/' directory. Change the model directory accordingly or use the Huggingface link.
```
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_flash_sdp(False)
model_name = 'moxin-org/moxin-chat-7b'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
pipe = pipeline(
"text-generation",
model=model,
tokenizer = tokenizer,
torch_dtype=torch.bfloat16,
device_map="auto"
)
prompt = "Can you explain the concept of regularization in machine learning?"
sequences = pipe(
prompt,
do_sample=True,
max_new_tokens=1000,
temperature=0.7,
top_k=50,
top_p=0.95,
num_return_sequences=1,
)
print(sequences[0]['generated_text'])
```
## Chat template
The chat template is available via the apply_chat_template() method:
```
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda"
model = AutoModelForCausalLM.from_pretrained("moxin-org/moxin-chat-7b")
tokenizer = AutoTokenizer.from_pretrained("moxin-org/moxin-chat-7b")
messages = [
{"role": "user", "content": "What is your favourite condiment?"},
{"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
{"role": "user", "content": "Do you have mayonnaise recipes?"}
]
encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
model_inputs = encodeds.to(device)
model.to(device)
generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
```
## Evaluation
We test the performance of our model with [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). The evaluation results on common datasets are shown below. We test on AI2 Reasoning Challenge (25-shot), HellaSwag (10-shot), MMLU (5-shot), and Winogrande (5-shot).
| Models | ARC-C | Hellaswag | MMLU | WinoGrade | Ave |
|:----------------------:|:-----:|:---------:|:-----:|:---------:|:-----:|
| Mistral-7B | 57.59 | 83.25 | 62.42 | 78.77 | 70.51 |
| LLaMA 3.1-8B | 54.61 | 81.95 | 65.16 | 77.35 | 69.77 |
| LLaMA 3-8B | 55.46 | 82.09 | 65.29 | 77.82 | 70.17 |
| LLaMA 2-7B | 49.74 | 78.94 | 45.89 | 74.27 | 62.21 |
| Qwen 2-7B | 57.68 | 80.76 | 70.42 | 77.43 | 71.57 |
| gemma-7b | 56.48 | 82.31 | 63.02 | 78.3 | 70.03 |
| internlm2.5-7b | 54.78 | 79.7 | 68.17 | 80.9 | 70.89 |
| Baichuan2-7B | 47.87 | 73.89 | 54.13 | 70.8 | 61.67 |
| Yi-1.5-9B | 58.36 | 80.36 | 69.54 | 77.53 | 71.48 |
| Moxin-7B-original | 53.75 | 75.46 | 59.43 | 70.32 | 64.74 |
| Moxin-7B-finetuned | 59.47 | 83.08 | 60.97 | 78.69 | 70.55 |
We also test the zero shot performance on AI2 Reasoning Challenge (0-shot), AI2 Reasoning Easy (0-shot), HellaSwag (0-shot), PIQA (0-shot) and Winogrande (0-shot). The results are shown below.
| Models | HellaSwag | WinoGrade | PIQA | ARC-E | ARC-C | Ave |
|:-----------------:|:---------:|:---------:|:-----:|:-----:|:-----:|:-----:|
| Mistral-7B | 80.39 | 73.4 | 82.15 | 78.28 | 52.22 | 73.29 |
| LLaMA 2-7B | 75.99 | 69.06 | 79.11 | 74.54 | 46.42 | 69.02 |
| LLaMA 2-13B | 79.37 | 72.22 | 80.52 | 77.4 | 49.06 | 71.71 |
| LLaMA 3.1-8B | 78.92 | 74.19 | 81.12 | 81.06 | 53.67 | 73.79 |
| gemma-7b | 80.45 | 73.72 | 80.9 | 79.97 | 54.1 | 73.83 |
| Qwen v2-7B | 78.9 | 72.38 | 79.98 | 74.71 | 50.09 | 71.21 |
| internlm2.5-7b | 79.14 | 77.9 | 80.52 | 76.16 | 51.37 | 73.02 |
| Baichuan2-7B | 72.25 | 67.17 | 77.26 | 72.98 | 42.15 | 66.36 |
| Yi-1.5-9B | 77.86 | 73.01 | 80.74 | 79.04 | 55.03 | 73.14 |
| deepseek-7b | 76.13 | 69.77 | 79.76 | 71.04 | 44.8 | 68.3 |
| Moxin-7B-original | 72.06 | 66.31 | 78.07 | 71.47 | 48.15 | 67.21 |
| Moxin-7B-finetune | 80.03 | 75.17 | 82.24 | 81.12 | 58.64 | 75.44 |
|