Adding Evaluation Results
#2
by
leaderboard-pr-bot
- opened
README.md
CHANGED
@@ -139,4 +139,17 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
|
|
139 |
|
140 |
In addition, to the official Open LLM Leaderboard, the results on OpenLLM Eval have been validated by [others as well (76.59)](https://github.com/saucam/model_evals/tree/main?tab=readme-ov-file#model-eval-results).
|
141 |
|
142 |
-
Our own initial eval is available [here (76.37)](https://gist.github.com/codelion/78f88333230801c9bbaa6fc22078d820).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
139 |
|
140 |
In addition, to the official Open LLM Leaderboard, the results on OpenLLM Eval have been validated by [others as well (76.59)](https://github.com/saucam/model_evals/tree/main?tab=readme-ov-file#model-eval-results).
|
141 |
|
142 |
+
Our own initial eval is available [here (76.37)](https://gist.github.com/codelion/78f88333230801c9bbaa6fc22078d820).
|
143 |
+
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
144 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_meraGPT__mera-mix-4x7B)
|
145 |
+
|
146 |
+
| Metric |Value|
|
147 |
+
|---------------------------------|----:|
|
148 |
+
|Avg. |75.91|
|
149 |
+
|AI2 Reasoning Challenge (25-Shot)|72.95|
|
150 |
+
|HellaSwag (10-Shot) |89.17|
|
151 |
+
|MMLU (5-Shot) |64.44|
|
152 |
+
|TruthfulQA (0-shot) |77.17|
|
153 |
+
|Winogrande (5-shot) |85.64|
|
154 |
+
|GSM8k (5-shot) |66.11|
|
155 |
+
|