leaderboard-pr-bot commited on
Commit
c2a25d9
1 Parent(s): 5cab6d5

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +106 -0
README.md CHANGED
@@ -160,6 +160,98 @@ model-index:
160
  source:
161
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
162
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
163
  ---
164
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/5s12oq859qLfDkkTNam_C.png)
165
 
@@ -434,3 +526,17 @@ Thanks to all open source AI community.
434
  If you would like to support me:
435
 
436
  [☕ Buy Me a Coffee](https://www.buymeacoffee.com/weyaxi)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
160
  source:
161
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
162
  name: Open LLM Leaderboard
163
+ - task:
164
+ type: text-generation
165
+ name: Text Generation
166
+ dataset:
167
+ name: IFEval (0-Shot)
168
+ type: HuggingFaceH4/ifeval
169
+ args:
170
+ num_few_shot: 0
171
+ metrics:
172
+ - type: inst_level_strict_acc and prompt_level_strict_acc
173
+ value: 45.68
174
+ name: strict accuracy
175
+ source:
176
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
177
+ name: Open LLM Leaderboard
178
+ - task:
179
+ type: text-generation
180
+ name: Text Generation
181
+ dataset:
182
+ name: BBH (3-Shot)
183
+ type: BBH
184
+ args:
185
+ num_few_shot: 3
186
+ metrics:
187
+ - type: acc_norm
188
+ value: 29.38
189
+ name: normalized accuracy
190
+ source:
191
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
192
+ name: Open LLM Leaderboard
193
+ - task:
194
+ type: text-generation
195
+ name: Text Generation
196
+ dataset:
197
+ name: MATH Lvl 5 (4-Shot)
198
+ type: hendrycks/competition_math
199
+ args:
200
+ num_few_shot: 4
201
+ metrics:
202
+ - type: exact_match
203
+ value: 5.74
204
+ name: exact match
205
+ source:
206
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
207
+ name: Open LLM Leaderboard
208
+ - task:
209
+ type: text-generation
210
+ name: Text Generation
211
+ dataset:
212
+ name: GPQA (0-shot)
213
+ type: Idavidrein/gpqa
214
+ args:
215
+ num_few_shot: 0
216
+ metrics:
217
+ - type: acc_norm
218
+ value: 4.25
219
+ name: acc_norm
220
+ source:
221
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
222
+ name: Open LLM Leaderboard
223
+ - task:
224
+ type: text-generation
225
+ name: Text Generation
226
+ dataset:
227
+ name: MuSR (0-shot)
228
+ type: TAUR-Lab/MuSR
229
+ args:
230
+ num_few_shot: 0
231
+ metrics:
232
+ - type: acc_norm
233
+ value: 11.23
234
+ name: acc_norm
235
+ source:
236
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
237
+ name: Open LLM Leaderboard
238
+ - task:
239
+ type: text-generation
240
+ name: Text Generation
241
+ dataset:
242
+ name: MMLU-PRO (5-shot)
243
+ type: TIGER-Lab/MMLU-Pro
244
+ config: main
245
+ split: test
246
+ args:
247
+ num_few_shot: 5
248
+ metrics:
249
+ - type: acc
250
+ value: 23.68
251
+ name: accuracy
252
+ source:
253
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
254
+ name: Open LLM Leaderboard
255
  ---
256
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/5s12oq859qLfDkkTNam_C.png)
257
 
 
526
  If you would like to support me:
527
 
528
  [☕ Buy Me a Coffee](https://www.buymeacoffee.com/weyaxi)
529
+
530
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
531
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Weyaxi__Einstein-v6.1-Llama3-8B)
532
+
533
+ | Metric |Value|
534
+ |-------------------|----:|
535
+ |Avg. |19.99|
536
+ |IFEval (0-Shot) |45.68|
537
+ |BBH (3-Shot) |29.38|
538
+ |MATH Lvl 5 (4-Shot)| 5.74|
539
+ |GPQA (0-shot) | 4.25|
540
+ |MuSR (0-shot) |11.23|
541
+ |MMLU-PRO (5-shot) |23.68|
542
+