JunxiongWang
/

Llama3.1-Mamba-8B-distill

Model card Files Files and versions Community

Llama3.1-Mamba-8B-distill / README.md

JunxiongWang's picture

Update README.md

a0bdc11 verified 8 days ago

|

history blame contribute delete

2.39 kB

	---
	license: apache-2.0
	---

	Zero-shot results when using the [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) as the teacher model, and the [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) as the initialized model

	\| Task \| Llama-3.1-8B-Instruct \| Llama3.1-Mamba-8B-distill \| Llama3.1-Mamba-8B-dpo \| Llama3.1-Mamba2-8B-distill \| Llama3.1-Mamba2-8B-dpo \|
	\|---------------------\|-----------------------\|--------------------------\|-----------------------\|---------------------------\|-----------------------\|
	\| arc_challenge \| 0.552 \| 0.5384 \| 0.5657 \| 0.5265 \| 0.5973 \|
	\| arc_easy \| 0.8178 \| 0.8224 \| 0.8401 \| 0.822 \| 0.8481 \|
	\| hellaswag \| 0.7921 \| 0.7591 \| 0.7736 \| 0.7536 \| 0.7969 \|
	\| mmlu (0 shot) \| 0.6812 \| 0.6213 \| 0.636 \| 0.6101 \| 0.5974 \|
	\| openbookqa \| 0.432 \| 0.428 \| 0.442 \| 0.416 \| 0.44 \|
	\| piqa \| 0.8079 \| 0.7933 \| 0.8041 \| 0.7889 \| 0.8003 \|
	\| pubmedqa \| 0.752 \| 0.72 \| 0.744 \| 0.726 \| 0.746 \|
	\| race \| 0.4478 \| 0.4211 \| 0.4344 \| 0.4211 \| 0.4612 \|
	\| winogrande \| 0.7388 \| 0.7277 \| 0.738 \| 0.7174 \| 0.7411 \|
	\| truthful \| 0.4267 \| 0.4002 \| 0.4607 \| 0.4031 \| 0.5022 \|

	```
	@article{junxiongdaniele2024mambainllama,
	title = {The Mamba in the Llama: Distilling and Accelerating Hybrid Models},
	author = {Junxiong Wang and Daniele Paliotta and Avner May and Alexander M. Rush and Tri Dao},
	journal = {arXiv preprint arXiv:2408.15237},
	year = {2024}
	}
	```