Rombos-LLM-V2.5.1-Qwen-3b
A little experiment I threw together to take a really high quality LLM I found (arcee-ai/raspberry-3B) and merge it using the last step of my Continuous Finetuning method outlines in the paper linked bellow.
https://docs.google.com/document/d/1OjbjU5AOz4Ftn9xHQrX3oFQGhQ6RDUuXQipnQ9gn6tU/edit?usp=sharing
Mergekit.yaml file is as follows:
models:
- model: Qwen2.5-3B-Instruct
parameters:
weight: 1
density: 1
- model: raspberry-3B
parameters:
weight: 1
density: 1
merge_method: ties
base_model: Qwen2.5-3B
parameters:
weight: 1
density: 1
normalize: true
int8_mask: true
dtype: bfloat16
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 13.22 |
IFEval (0-Shot) | 25.95 |
BBH (3-Shot) | 14.88 |
MATH Lvl 5 (4-Shot) | 8.31 |
GPQA (0-shot) | 3.24 |
MuSR (0-shot) | 7.82 |
MMLU-PRO (5-shot) | 19.10 |
- Downloads last month
- 37
Model tree for rombodawg/Rombos-LLM-V2.5.1-Qwen-3b
Evaluation results
- strict accuracy on IFEval (0-Shot)Open LLM Leaderboard25.950
- normalized accuracy on BBH (3-Shot)Open LLM Leaderboard14.880
- exact match on MATH Lvl 5 (4-Shot)Open LLM Leaderboard8.310
- acc_norm on GPQA (0-shot)Open LLM Leaderboard3.240
- acc_norm on MuSR (0-shot)Open LLM Leaderboard7.820
- accuracy on MMLU-PRO (5-shot)test set Open LLM Leaderboard19.100