moxin-chat-7b / README.md

Update README.md

630a2ed verified 13 days ago

5.28 kB

	---
	license: apache-2.0
	---


	<h1 align="center"> Moxin Chat 7B </h1>

	<p align="center"> <a href="https://github.com/moxin-org/Moxin-LLM">Home Page</a> &nbsp&nbsp \| &nbsp&nbsp <a href="https://arxiv.org/abs/2412.06845">Technical Report</a> &nbsp&nbsp \| &nbsp&nbsp <a href="https://huggingface.co/moxin-org/moxin-llm-7b">Base Model</a> &nbsp&nbsp \| &nbsp&nbsp <a href="https://huggingface.co/moxin-org/moxin-chat-7b">Chat Model</a> </p>




	## Model
	You can download our base 7B model from this [link](https://huggingface.co/moxin-org/moxin-llm-7b) and our chat 7B model from this [link](https://huggingface.co/moxin-org/moxin-chat-7b).



	## Inference

	You can use the following code to run inference with the model. The model is saved under './model/' directory. Change the model directory accordingly or use the Huggingface link.

	```
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

	torch.backends.cuda.enable_mem_efficient_sdp(False)
	torch.backends.cuda.enable_flash_sdp(False)

	model_name = 'moxin-org/moxin-chat-7b'
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	trust_remote_code=True,
	)

	pipe = pipeline(
	"text-generation",
	model=model,
	tokenizer = tokenizer,
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)

	prompt = "Can you explain the concept of regularization in machine learning?"

	sequences = pipe(
	prompt,
	do_sample=True,
	max_new_tokens=1000,
	temperature=0.7,
	top_k=50,
	top_p=0.95,
	num_return_sequences=1,
	)
	print(sequences[0]['generated_text'])
	```

	## Chat template

	The chat template is available via the apply_chat_template() method:
	```
	from transformers import AutoModelForCausalLM, AutoTokenizer

	device = "cuda"

	model = AutoModelForCausalLM.from_pretrained("moxin-org/moxin-chat-7b")
	tokenizer = AutoTokenizer.from_pretrained("moxin-org/moxin-chat-7b")

	messages = [
	{"role": "user", "content": "What is your favourite condiment?"},
	{"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
	{"role": "user", "content": "Do you have mayonnaise recipes?"}
	]

	encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

	model_inputs = encodeds.to(device)
	model.to(device)

	generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
	decoded = tokenizer.batch_decode(generated_ids)
	print(decoded[0])
	```


	## Evaluation

	We test the performance of our model with [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). The evaluation results on common datasets are shown below. We test on AI2 Reasoning Challenge (25-shot), HellaSwag (10-shot), MMLU (5-shot), and Winogrande (5-shot).

	\| Models \| ARC-C \| Hellaswag \| MMLU \| WinoGrade \| Ave \|
	\|:----------------------:\|:-----:\|:---------:\|:-----:\|:---------:\|:-----:\|
	\| Mistral-7B \| 57.59 \| 83.25 \| 62.42 \| 78.77 \| 70.51 \|
	\| LLaMA 3.1-8B \| 54.61 \| 81.95 \| 65.16 \| 77.35 \| 69.77 \|
	\| LLaMA 3-8B \| 55.46 \| 82.09 \| 65.29 \| 77.82 \| 70.17 \|
	\| LLaMA 2-7B \| 49.74 \| 78.94 \| 45.89 \| 74.27 \| 62.21 \|
	\| Qwen 2-7B \| 57.68 \| 80.76 \| 70.42 \| 77.43 \| 71.57 \|
	\| gemma-7b \| 56.48 \| 82.31 \| 63.02 \| 78.3 \| 70.03 \|
	\| internlm2.5-7b \| 54.78 \| 79.7 \| 68.17 \| 80.9 \| 70.89 \|
	\| Baichuan2-7B \| 47.87 \| 73.89 \| 54.13 \| 70.8 \| 61.67 \|
	\| Yi-1.5-9B \| 58.36 \| 80.36 \| 69.54 \| 77.53 \| 71.48 \|
	\| Moxin-7B-original \| 53.75 \| 75.46 \| 59.43 \| 70.32 \| 64.74 \|
	\| Moxin-7B-finetuned \| 59.47 \| 83.08 \| 60.97 \| 78.69 \| 70.55 \|


	We also test the zero shot performance on AI2 Reasoning Challenge (0-shot), AI2 Reasoning Easy (0-shot), HellaSwag (0-shot), PIQA (0-shot) and Winogrande (0-shot). The results are shown below.

	\| Models \| HellaSwag \| WinoGrade \| PIQA \| ARC-E \| ARC-C \| Ave \|
	\|:-----------------:\|:---------:\|:---------:\|:-----:\|:-----:\|:-----:\|:-----:\|
	\| Mistral-7B \| 80.39 \| 73.4 \| 82.15 \| 78.28 \| 52.22 \| 73.29 \|
	\| LLaMA 2-7B \| 75.99 \| 69.06 \| 79.11 \| 74.54 \| 46.42 \| 69.02 \|
	\| LLaMA 2-13B \| 79.37 \| 72.22 \| 80.52 \| 77.4 \| 49.06 \| 71.71 \|
	\| LLaMA 3.1-8B \| 78.92 \| 74.19 \| 81.12 \| 81.06 \| 53.67 \| 73.79 \|
	\| gemma-7b \| 80.45 \| 73.72 \| 80.9 \| 79.97 \| 54.1 \| 73.83 \|
	\| Qwen v2-7B \| 78.9 \| 72.38 \| 79.98 \| 74.71 \| 50.09 \| 71.21 \|
	\| internlm2.5-7b \| 79.14 \| 77.9 \| 80.52 \| 76.16 \| 51.37 \| 73.02 \|
	\| Baichuan2-7B \| 72.25 \| 67.17 \| 77.26 \| 72.98 \| 42.15 \| 66.36 \|
	\| Yi-1.5-9B \| 77.86 \| 73.01 \| 80.74 \| 79.04 \| 55.03 \| 73.14 \|
	\| deepseek-7b \| 76.13 \| 69.77 \| 79.76 \| 71.04 \| 44.8 \| 68.3 \|
	\| Moxin-7B-original \| 72.06 \| 66.31 \| 78.07 \| 71.47 \| 48.15 \| 67.21 \|
	\| Moxin-7B-finetune \| 80.03 \| 75.17 \| 82.24 \| 81.12 \| 58.64 \| 75.44 \|