Rombos-LLM-V2.5.1-Qwen-3b

A little experiment I threw together to take a really high quality LLM I found (arcee-ai/raspberry-3B) and merge it using the last step of my Continuous Finetuning method outlines in the paper linked bellow.

https://docs.google.com/document/d/1OjbjU5AOz4Ftn9xHQrX3oFQGhQ6RDUuXQipnQ9gn6tU/edit?usp=sharing

Mergekit.yaml file is as follows:

models:
  - model: Qwen2.5-3B-Instruct
    parameters:
      weight: 1
      density: 1
  - model: raspberry-3B
    parameters:
      weight: 1
      density: 1
merge_method: ties
base_model: Qwen2.5-3B
parameters:
  weight: 1
  density: 1
  normalize: true
  int8_mask: true
dtype: bfloat16

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	13.22
IFEval (0-Shot)	25.95
BBH (3-Shot)	14.88
MATH Lvl 5 (4-Shot)	8.31
GPQA (0-shot)	3.24
MuSR (0-shot)	7.82
MMLU-PRO (5-shot)	19.10

Model tree for rombodawg/Rombos-LLM-V2.5.1-Qwen-3b

Evaluation results

strict accuracy on IFEval (0-Shot)
Open LLM Leaderboard

25.950
normalized accuracy on BBH (3-Shot)
Open LLM Leaderboard

14.880
exact match on MATH Lvl 5 (4-Shot)
Open LLM Leaderboard

8.310
acc_norm on GPQA (0-shot)
Open LLM Leaderboard

3.240
acc_norm on MuSR (0-shot)
Open LLM Leaderboard

7.820
accuracy on MMLU-PRO (5-shot)
test set Open LLM Leaderboard

19.100

View on Papers With Code