File size: 5,312 Bytes
a14b4fb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f02a1dd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a14b4fb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
license: apache-2.0
---


<h1 align="center"> Moxin Chat 7B </h1>

<p align="center"> <a href="https://github.com/moxin-org/Moxin-LLM">Home Page</a> &nbsp&nbsp | &nbsp&nbsp <a href="https://github.com/moxin-org/Moxin-LLM/blob/main/report/Moxin_7B.pdf">Technical Report</a> &nbsp&nbsp | &nbsp&nbsp <a href="https://huggingface.co/moxin-org/moxin-llm-7b">Base Model</a> &nbsp&nbsp | &nbsp&nbsp <a href="https://huggingface.co/moxin-org/moxin-chat-7b">Chat Model</a>  </p>




## Model
You can download our base 7B model from this [link](https://huggingface.co/moxin-org/moxin-llm-7b) and our chat 7B model from  this [link](https://huggingface.co/moxin-org/moxin-chat-7b). 
 


## Inference

You can use the following code to run inference with the model. The model is saved under './model/' directory. Change the model directory accordingly or use the Huggingface link. 

```
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_flash_sdp(False)

model_name = 'moxin-org/moxin-chat-7b'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype=torch.bfloat16,
        device_map="auto",
        trust_remote_code=True,
    )

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer = tokenizer,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = "Can you explain the concept of regularization in machine learning?"

sequences = pipe(
    prompt,
    do_sample=True,
    max_new_tokens=1000,
    temperature=0.7,
    top_k=50,
    top_p=0.95,
    num_return_sequences=1,
)
print(sequences[0]['generated_text'])
```

## Chat template 

The chat template is available via the apply_chat_template() method:
```
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" 

model = AutoModelForCausalLM.from_pretrained("moxin-org/moxin-chat-7b")
tokenizer = AutoTokenizer.from_pretrained("moxin-org/moxin-chat-7b")

messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
```


## Evaluation 

We test the performance of our model with [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). The evaluation results on common datasets are shown below. We test on AI2 Reasoning Challenge (25-shot), HellaSwag (10-shot), MMLU (5-shot), and Winogrande (5-shot).

|          Models         | ARC-C | Hellaswag |  MMLU | WinoGrade |  Ave  |
|:----------------------:|:-----:|:---------:|:-----:|:---------:|:-----:|
|    Mistral-7B   | 57.59 |   83.25   | 62.42 |   78.77   | 70.51 |
|     LLaMA 3.1-8B     | 54.61 |   81.95   | 65.16 |   77.35   | 69.77 |
|      LLaMA 3-8B      | 55.46 |   82.09   | 65.29 |   77.82   | 70.17 |
|      LLaMA 2-7B      | 49.74 |   78.94   | 45.89 |   74.27   | 62.21 |
|       Qwen 2-7B      | 57.68 |   80.76   | 70.42 |   77.43   | 71.57 |
|       gemma-7b       | 56.48 |   82.31   | 63.02 |    78.3   | 70.03 |
|    internlm2.5-7b    | 54.78 |    79.7   | 68.17 |    80.9   | 70.89 |
|     Baichuan2-7B     | 47.87 |   73.89   | 54.13 |    70.8   | 61.67 |
|        Yi-1.5-9B       | 58.36 |   80.36   | 69.54 |   77.53   | 71.48 |
|  Moxin-7B-original | 53.75 |   75.46   | 59.43 |   70.32   | 64.74 |
| Moxin-7B-finetuned | 59.47 |   83.08   | 60.97 |   78.69   | 70.55 |


We also test the zero shot performance on AI2 Reasoning Challenge (0-shot), AI2 Reasoning Easy (0-shot), HellaSwag (0-shot), PIQA (0-shot) and Winogrande (0-shot). The results are shown below. 

|      Models       	| HellaSwag 	| WinoGrade 	|  PIQA 	| ARC-E 	| ARC-C 	|  Ave  	|
|:-----------------:|:---------:|:---------:|:-----:|:-----:|:-----:|:-----:|
| Mistral-7B 	|   80.39   	|    73.4   	| 82.15 	| 78.28 	| 52.22 	| 73.29 	|
|     LLaMA 2-7B    	|   75.99   	|   69.06   	| 79.11 	| 74.54 	| 46.42 	| 69.02 	|
|    LLaMA 2-13B    	|   79.37   	|   72.22   	| 80.52 	|  77.4 	| 49.06 	| 71.71 	|
|    LLaMA 3.1-8B   	|   78.92   	|   74.19   	| 81.12 	| 81.06 	| 53.67 	| 73.79 	|
|      gemma-7b     	|   80.45   	|   73.72   	|  80.9 	| 79.97 	|  54.1 	| 73.83 	|
|     Qwen v2-7B    	|    78.9   	|   72.38   	| 79.98 	| 74.71 	| 50.09 	| 71.21 	|
|   internlm2.5-7b  	|   79.14   	|    77.9   	| 80.52 	| 76.16 	| 51.37 	| 73.02 	|
|    Baichuan2-7B   	|   72.25   	|   67.17   	| 77.26 	| 72.98 	| 42.15 	| 66.36 	|
|     Yi-1.5-9B     	|   77.86   	|   73.01   	| 80.74 	| 79.04 	| 55.03 	| 73.14 	|
|    deepseek-7b    	|   76.13   	|   69.77   	| 79.76 	| 71.04 	|  44.8 	|  68.3 	|
| Moxin-7B-original 	|   72.06   	|   66.31   	| 78.07 	| 71.47 	| 48.15 	| 67.21 	|
| Moxin-7B-finetune 	|   80.03   	|   75.17   	| 82.24 	| 81.12 	| 58.64 	| 75.44 	|