mlabonne commited on
Commit
e0f63c2
1 Parent(s): bf0169b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -14
README.md CHANGED
@@ -134,6 +134,22 @@ I used this string to visualize it, where 0 are original layers and 1 duplicated
134
  The main idea is that the input/output difference of middle layers is quite small, so replicating a middle layer has a small impact on the output.
135
  The additional layers are designed to increase the model's capacity without breaking the information flow, which often creates "insane" self-merges.
136
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
137
  ## 🧩 Configuration
138
 
139
  The following YAML configuration was used to produce this model:
@@ -315,17 +331,4 @@ pipeline = transformers.pipeline(
315
 
316
  outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
317
  print(outputs[0]["generated_text"])
318
- ```
319
- # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
320
- Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_mlabonne__BigQwen2.5-Echo-47B-Instruct)
321
-
322
- | Metric |Value|
323
- |-------------------|----:|
324
- |Avg. |30.31|
325
- |IFEval (0-Shot) |73.57|
326
- |BBH (3-Shot) |44.52|
327
- |MATH Lvl 5 (4-Shot)| 3.47|
328
- |GPQA (0-shot) | 8.61|
329
- |MuSR (0-shot) |10.19|
330
- |MMLU-PRO (5-shot) |41.49|
331
-
 
134
  The main idea is that the input/output difference of middle layers is quite small, so replicating a middle layer has a small impact on the output.
135
  The additional layers are designed to increase the model's capacity without breaking the information flow, which often creates "insane" self-merges.
136
 
137
+ ## 🏆 Evaluation
138
+
139
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_mlabonne__BigQwen2.5-Echo-47B-Instruct).
140
+
141
+ TBD: add mlabonne/BigQwen2.5-52B-Instruct's results.
142
+
143
+ | Metric |**BigQwen2.5-Echo-47B-Instruct**|Qwen2.5-32B-Instruct|
144
+ |-------------------|----:|----:|
145
+ |Avg. |30.31|36.17|
146
+ |IFEval (0-Shot) |73.57|83.46|
147
+ |BBH (3-Shot) |44.52|56.49|
148
+ |MATH Lvl 5 (4-Shot)| 3.47|0|
149
+ |GPQA (0-shot) | 8.61|11.74|
150
+ |MuSR (0-shot) |10.19|13.5|
151
+ |MMLU-PRO (5-shot) |41.49|51.85|
152
+
153
  ## 🧩 Configuration
154
 
155
  The following YAML configuration was used to produce this model:
 
331
 
332
  outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
333
  print(outputs[0]["generated_text"])
334
+ ```