Update README.md
Browse files
README.md
CHANGED
@@ -143,13 +143,12 @@ print(res[0]["text"])
|
|
143 |
```
|
144 |
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
145 |
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_OEvortex__vortex-3b-v2)
|
146 |
-
|
|
147 |
-
|
148 |
-
|Avg.
|
149 |
-
|AI2 Reasoning Challenge (25-Shot)|39.68|
|
150 |
-
|HellaSwag (10-Shot)
|
151 |
-
|MMLU (5-Shot)
|
152 |
-
|TruthfulQA (0-shot)
|
153 |
-
|Winogrande (5-shot)
|
154 |
-
|GSM8k (5-shot)
|
155 |
-
|
|
|
143 |
```
|
144 |
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
145 |
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_OEvortex__vortex-3b-v2)
|
146 |
+
| Metric | vortex 3b | vortex 3b-v2 | dolly-v2-3b | pythia-2.8b-deduped |
|
147 |
+
|---------|----------:|-------------:|------------------:|----------------------------------:|
|
148 |
+
| Avg. | 35.76 | 37.46 | 25.26 | 36.72 |
|
149 |
+
| AI2 Reasoning Challenge (25-Shot) | 31.91 | 39.68 | 22.83 | 36.26 |
|
150 |
+
| HellaSwag (10-Shot) | 56.89 | 65.04 | 26.55 | 60.66 |
|
151 |
+
| MMLU (5-Shot) | 27.32 | 25.09 | 24.7 | 26.78 |
|
152 |
+
| TruthfulQA (0-shot) | 37.39 | 33.80 | 0 | 35.56 |
|
153 |
+
| Winogrande (5-shot) | 60.14 | 59.12 | 59.43 | 60.22 |
|
154 |
+
| GSM8k (5-shot) | 0.91 | 2.05 | 1.86 | 0.83 |
|
|