Update README.md
Browse files
README.md
CHANGED
@@ -134,6 +134,22 @@ I used this string to visualize it, where 0 are original layers and 1 duplicated
|
|
134 |
The main idea is that the input/output difference of middle layers is quite small, so replicating a middle layer has a small impact on the output.
|
135 |
The additional layers are designed to increase the model's capacity without breaking the information flow, which often creates "insane" self-merges.
|
136 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
137 |
## 🧩 Configuration
|
138 |
|
139 |
The following YAML configuration was used to produce this model:
|
@@ -315,17 +331,4 @@ pipeline = transformers.pipeline(
|
|
315 |
|
316 |
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
|
317 |
print(outputs[0]["generated_text"])
|
318 |
-
```
|
319 |
-
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
|
320 |
-
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_mlabonne__BigQwen2.5-Echo-47B-Instruct)
|
321 |
-
|
322 |
-
| Metric |Value|
|
323 |
-
|-------------------|----:|
|
324 |
-
|Avg. |30.31|
|
325 |
-
|IFEval (0-Shot) |73.57|
|
326 |
-
|BBH (3-Shot) |44.52|
|
327 |
-
|MATH Lvl 5 (4-Shot)| 3.47|
|
328 |
-
|GPQA (0-shot) | 8.61|
|
329 |
-
|MuSR (0-shot) |10.19|
|
330 |
-
|MMLU-PRO (5-shot) |41.49|
|
331 |
-
|
|
|
134 |
The main idea is that the input/output difference of middle layers is quite small, so replicating a middle layer has a small impact on the output.
|
135 |
The additional layers are designed to increase the model's capacity without breaking the information flow, which often creates "insane" self-merges.
|
136 |
|
137 |
+
## 🏆 Evaluation
|
138 |
+
|
139 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_mlabonne__BigQwen2.5-Echo-47B-Instruct).
|
140 |
+
|
141 |
+
TBD: add mlabonne/BigQwen2.5-52B-Instruct's results.
|
142 |
+
|
143 |
+
| Metric |**BigQwen2.5-Echo-47B-Instruct**|Qwen2.5-32B-Instruct|
|
144 |
+
|-------------------|----:|----:|
|
145 |
+
|Avg. |30.31|36.17|
|
146 |
+
|IFEval (0-Shot) |73.57|83.46|
|
147 |
+
|BBH (3-Shot) |44.52|56.49|
|
148 |
+
|MATH Lvl 5 (4-Shot)| 3.47|0|
|
149 |
+
|GPQA (0-shot) | 8.61|11.74|
|
150 |
+
|MuSR (0-shot) |10.19|13.5|
|
151 |
+
|MMLU-PRO (5-shot) |41.49|51.85|
|
152 |
+
|
153 |
## 🧩 Configuration
|
154 |
|
155 |
The following YAML configuration was used to produce this model:
|
|
|
331 |
|
332 |
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
|
333 |
print(outputs[0]["generated_text"])
|
334 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|