Z3R6X/Llama-3-8B-ORPO-V1

Llama 3 8B finetuned on mlabonne/orpo-dpo-mix-40k with ORPO.
Max length was reduced to 1024 tokens. LoRA (r=16) and 4bit quantization was used to increase memory efficiency.

Benchmark	LLaMa 3 8B	LLaMa 3 8B Inst	LLaMa 3 8B ORPO V1
MMLU	62.12	63.92	61.87
BoolQ	81.04	83.21	82.42
Winogrande	73.24	72.06	74.43
ARC-Challenge	53.24	56.91	52.90
TriviaQA	63.33	51.09	63.93
GSM-8K (flexible)	50.27	75.13	52.16
SQuAD V2 (f1)	32.48	29.68	33.68
LogiQA	29.23	32.87	30.26
All scores obtained with lm-evaluation-harness v0.4.2

Z3R6X
/

Llama-3-8B-ORPO-V1

Model tree for Z3R6X/Llama-3-8B-ORPO-V1