Zero-shot results when using the Llama-3.1-70B-Instruct as the teacher model, and the Llama-3.1-8B-Instruct as the initialized model

Task Llama-3.1-8B-Instruct Llama3.1-Mamba-8B-distill Llama3.1-Mamba-8B-dpo Llama3.1-Mamba2-8B-distill Llama3.1-Mamba2-8B-dpo
arc_challenge 0.552 0.5384 0.5657 0.5265 0.5973
arc_easy 0.8178 0.8224 0.8401 0.822 0.8481
hellaswag 0.7921 0.7591 0.7736 0.7536 0.7969
mmlu (0 shot) 0.6812 0.6213 0.636 0.6101 0.5974
openbookqa 0.432 0.428 0.442 0.416 0.44
piqa 0.8079 0.7933 0.8041 0.7889 0.8003
pubmedqa 0.752 0.72 0.744 0.726 0.746
race 0.4478 0.4211 0.4344 0.4211 0.4612
winogrande 0.7388 0.7277 0.738 0.7174 0.7411
truthful 0.4267 0.4002 0.4607 0.4031 0.5022
@article{junxiongdaniele2024mambainllama,
  title   = {The Mamba in the Llama: Distilling and Accelerating Hybrid Models},
  author  = {Junxiong Wang and Daniele Paliotta and Avner May and Alexander M. Rush and Tri Dao},
  journal = {arXiv preprint arXiv:2408.15237},
  year    = {2024}
}
Downloads last month
38
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Collection including JunxiongWang/Llama3.1-Mamba2-8B-distill