smol llama
Collection
🚧"raw" pretrained smol_llama checkpoints - WIP 🚧
•
4 items
•
Updated
•
6
A small 81M param (total) decoder model, enabled through tying the input/output embeddings. This is the first version of the model.
This checkpoint is the 'raw' pre-trained model and has not been tuned to a more specific task. It should be fine-tuned before use in most cases.
Detailed results can be found here
| Metric | Value |
|---|---|
| Avg. | 24.52 |
| ARC (25-shot) | 22.18 |
| HellaSwag (10-shot) | 29.33 |
| MMLU (5-shot) | 24.06 |
| TruthfulQA (0-shot) | 43.97 |
| Winogrande (5-shot) | 49.25 |
| GSM8K (5-shot) | 0.23 |
| DROP (3-shot) | 2.64 |