File size: 2,394 Bytes
0c247b3 abfb5d3 0c247b3 abfb5d3 a0bdc11 abfb5d3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
---
license: apache-2.0
---
Zero-shot results when using the [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) as the teacher model, and the [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) as the initialized model
| Task | Llama-3.1-8B-Instruct | Llama3.1-Mamba-8B-distill | Llama3.1-Mamba-8B-dpo | Llama3.1-Mamba2-8B-distill | Llama3.1-Mamba2-8B-dpo |
|---------------------|-----------------------|--------------------------|-----------------------|---------------------------|-----------------------|
| arc_challenge | 0.552 | 0.5384 | 0.5657 | 0.5265 | 0.5973 |
| arc_easy | 0.8178 | 0.8224 | 0.8401 | 0.822 | 0.8481 |
| hellaswag | 0.7921 | 0.7591 | 0.7736 | 0.7536 | 0.7969 |
| mmlu (0 shot) | 0.6812 | 0.6213 | 0.636 | 0.6101 | 0.5974 |
| openbookqa | 0.432 | 0.428 | 0.442 | 0.416 | 0.44 |
| piqa | 0.8079 | 0.7933 | 0.8041 | 0.7889 | 0.8003 |
| pubmedqa | 0.752 | 0.72 | 0.744 | 0.726 | 0.746 |
| race | 0.4478 | 0.4211 | 0.4344 | 0.4211 | 0.4612 |
| winogrande | 0.7388 | 0.7277 | 0.738 | 0.7174 | 0.7411 |
| truthful | 0.4267 | 0.4002 | 0.4607 | 0.4031 | 0.5022 |
```
@article{junxiongdaniele2024mambainllama,
title = {The Mamba in the Llama: Distilling and Accelerating Hybrid Models},
author = {Junxiong Wang and Daniele Paliotta and Avner May and Alexander M. Rush and Tri Dao},
journal = {arXiv preprint arXiv:2408.15237},
year = {2024}
}
``` |