Model Card for Model ID
This model is a fine-tuned version of meta-llama/Llama-3.2-1B, using ORPO (Optimized Regularization for Prompt Optimization) Trainer. This model is fine-tuned using the mlabonne/orpo-dpo-mix-40k dataset. Only 1000 data samples were used to train quickly using ORPO.
Model Details
Model Description
The base model meta-llama/Llama-3.2-1B has been fine-tuned using ORPO on a few samples of mlabonne/orpo-dpo-mix-40k dataset. The Llama 3.2 instruction-tuned text-only model is optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. This fine-tuned version is aimed at improving the understanding of the context in prompts and thereby increasing the interpretability of the model.
- Finetuned from model [meta-llama/Llama-3.2-1B]
- Model Size: 1 Billion parameters
- Fine-tuning Method: ORPO
- Dataset: mlabonne/orpo-dpo-mix-40k
Evaluation
The model was evaluated on the following benchmarks, with the following performance metrics:
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
hellaswag | 1 | none | 0 | acc | ↑ | 0.4772 | ± | 0.0050 |
none | 0 | acc_norm | ↑ | 0.6366 | ± | 0.0048 | ||
tinyMMLU | 0 | none | 0 | acc_norm | ↑ | 0.4306 | ± | N/A |
eq_bench | 2.1 | none | 0 | eqbench | ↑ | -12.9709 | ± | 2.9658 |
none | 0 | percent_parseable | ↑ | 92.9825 | ± | 1.9592 |
- Downloads last month
- 37