UNA-SimpleSmaug-34b-v1beta

Scoring 04-February-2024 #1 34B model, outperforming its original base model Smaug-34B-v0.1 with 77.41 😎 Oh, btw.. this one went thru SFT so the abacus inside Smaug is back to normal.. so you can further train/dpo him .. RESET!..

UPDATES March : Stills undisputed 34B King Smaug 70B stills undisputed 70B King

==== And people wonders.. why there is no UNA of Hermes or Smaug 70B? << i dont think is worth the time to spend on a model that is widely known for not being too useful, likely UNA can fix some of the internal mess.. for Hermes, we spoke chitchat quick a couple times but nothing solid, but we would like to make a reborn of excellent models using UNA, just liek we did with UNA-Dolphin where we saw relevant performance is short time.

Applied UNA only on the Attention, not on the MLP's

Is based on Smaug
SimpleMath dataset
It was trained on Axolotl

Experiment

The thing here is to understand whats the impact of SimpleMath applied at the attention layer during a SFT session and how it impacts on the neural network overall.

Results: Improving mathematican and reasoning capabilities without degrading and presserving previous training sessions.

And enjoy our ModelSimilarities tool detector https://github.com/fblgit/model-similarity where we confirmed numerically the bloodties of the model.

Evals

Metric	Value
Avg.	77.41
AI2 Reasoning Challenge (25-Shot)	74.57
HellaSwag (10-Shot)	86.74
MMLU (5-Shot)	76.68
TruthfulQA (0-shot)	70.17
Winogrande (5-shot)	83.82
GSM8k (5-shot)	72.48

|    Task     |Version| Metric |Value            |
|-------------|------:|--------|----------------:|
|arc_challenge|     HF|acc_norm| 0.7457337883959 |
|gsm8k        |     HF|acc     | 0.7247915087187 |
|mmlu         |     HF|acc     | 0.7649553475572 |
|mmlu         |     HF|acc_norm| 0.7681713551647 |
|hellaswag    |     HF|acc_norm| 0.8673571001792 | 
|truthfulqa   |     HF|mc2     | 0.7016557407771 |
|winogrande   |     HF|acc     | 0.8382004735595 |
|------------------------------------------------|

Increasing GSM, MMLU, ARC, WINO.

Citations

To abacusai for making Smaug-34B, the Bagel, and all the magic behind the base model.

If you use the model, provide citation even for merges or anything.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	23.12
IFEval (0-Shot)	45.56
BBH (3-Shot)	32.78
MATH Lvl 5 (4-Shot)	0.15
GPQA (0-shot)	8.95
MuSR (0-shot)	11.96
MMLU-PRO (5-shot)	39.33

Downloads last month: 9,302

Safetensors

Model size

34.4B params

Tensor type

BF16

Model tree for fblgit/UNA-SimpleSmaug-34b-v1beta

Base model

jondurbin/bagel-34b-v0.2

Finetuned

abacusai/Smaug-34B-v0.1

Finetuned

(1)

this model

Merges

1 model

Quantizations

3 models

Datasets used to train fblgit/UNA-SimpleSmaug-34b-v1beta

Spaces using fblgit/UNA-SimpleSmaug-34b-v1beta 2

Collection including fblgit/UNA-SimpleSmaug-34b-v1beta

Juanako Top Models

Collection

These are the Juanako 7B Trained with SFT & DDP & UNA • 8 items • Updated Nov 23, 2024 • 4

Evaluation results

normalized accuracy on AI2 Reasoning Challenge (25-Shot)
test set Open LLM Leaderboard

74.570
normalized accuracy on HellaSwag (10-Shot)
validation set Open LLM Leaderboard

86.740
accuracy on MMLU (5-Shot)
test set Open LLM Leaderboard

76.680
mc2 on TruthfulQA (0-shot)
validation set Open LLM Leaderboard

70.170
accuracy on Winogrande (5-shot)
validation set Open LLM Leaderboard

83.820
accuracy on GSM8k (5-shot)
test set Open LLM Leaderboard

72.480
strict accuracy on IFEval (0-Shot)
Open LLM Leaderboard

45.560
normalized accuracy on BBH (3-Shot)
Open LLM Leaderboard

32.780
exact match on MATH Lvl 5 (4-Shot)
Open LLM Leaderboard

0.150
acc_norm on GPQA (0-shot)
Open LLM Leaderboard

8.950

View on Papers With Code