Llama 3 8B finetuned on mlabonne/orpo-dpo-mix-40k with ORPO.
Max length was reduced to 1024 tokens. LoRA (r=16) and 4bit quantization was used to increase memory efficiency.
Benchmark | LLaMa 3 8B | LLaMa 3 8B Inst | LLaMa 3 8B ORPO V1 | LLaMa 3 8B ORPO V2 (WIP) |
---|---|---|---|---|
MMLU | 62.12 | 63.92 | 61.87 | |
BoolQ | 81.04 | 83.21 | 82.42 | |
Winogrande | 73.24 | 72.06 | 74.43 | |
ARC-Challenge | 53.24 | 56.91 | 52.90 | |
TriviaQA | 63.33 | 51.09 | 63.93 | |
GSM-8K (flexible) | 50.27 | 75.13 | 52.16 | |
SQuAD V2 (f1) | 32.48 | 29.68 | 33.68 | |
LogiQA | 29.23 | 32.87 | 30.26 | |
All scores obtained with lm-evaluation-harness v0.4.2 |
- Downloads last month
- 6
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.