leaderboard-pt-pr-bot commited on
Commit
cd9da08
1 Parent(s): 67e9882

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard) to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +162 -0
README.md CHANGED
@@ -198,6 +198,150 @@ model-index:
198
  source:
199
  url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Bagel-Hermes-34B-Slerp
200
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
201
  ---
202
  # Bagel-Hermes-34B-Slerp
203
 
@@ -270,3 +414,21 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
270
  |MuSR (0-shot) |17.01|
271
  |MMLU-PRO (5-shot) |41.15|
272
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
198
  source:
199
  url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Bagel-Hermes-34B-Slerp
200
  name: Open LLM Leaderboard
201
+ - task:
202
+ type: text-generation
203
+ name: Text Generation
204
+ dataset:
205
+ name: ENEM Challenge (No Images)
206
+ type: eduagarcia/enem_challenge
207
+ split: train
208
+ args:
209
+ num_few_shot: 3
210
+ metrics:
211
+ - type: acc
212
+ value: 74.32
213
+ name: accuracy
214
+ source:
215
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Weyaxi/Bagel-Hermes-34B-Slerp
216
+ name: Open Portuguese LLM Leaderboard
217
+ - task:
218
+ type: text-generation
219
+ name: Text Generation
220
+ dataset:
221
+ name: BLUEX (No Images)
222
+ type: eduagarcia-temp/BLUEX_without_images
223
+ split: train
224
+ args:
225
+ num_few_shot: 3
226
+ metrics:
227
+ - type: acc
228
+ value: 67.59
229
+ name: accuracy
230
+ source:
231
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Weyaxi/Bagel-Hermes-34B-Slerp
232
+ name: Open Portuguese LLM Leaderboard
233
+ - task:
234
+ type: text-generation
235
+ name: Text Generation
236
+ dataset:
237
+ name: OAB Exams
238
+ type: eduagarcia/oab_exams
239
+ split: train
240
+ args:
241
+ num_few_shot: 3
242
+ metrics:
243
+ - type: acc
244
+ value: 55.13
245
+ name: accuracy
246
+ source:
247
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Weyaxi/Bagel-Hermes-34B-Slerp
248
+ name: Open Portuguese LLM Leaderboard
249
+ - task:
250
+ type: text-generation
251
+ name: Text Generation
252
+ dataset:
253
+ name: Assin2 RTE
254
+ type: assin2
255
+ split: test
256
+ args:
257
+ num_few_shot: 15
258
+ metrics:
259
+ - type: f1_macro
260
+ value: 92.16
261
+ name: f1-macro
262
+ source:
263
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Weyaxi/Bagel-Hermes-34B-Slerp
264
+ name: Open Portuguese LLM Leaderboard
265
+ - task:
266
+ type: text-generation
267
+ name: Text Generation
268
+ dataset:
269
+ name: Assin2 STS
270
+ type: eduagarcia/portuguese_benchmark
271
+ split: test
272
+ args:
273
+ num_few_shot: 15
274
+ metrics:
275
+ - type: pearson
276
+ value: 82.23
277
+ name: pearson
278
+ source:
279
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Weyaxi/Bagel-Hermes-34B-Slerp
280
+ name: Open Portuguese LLM Leaderboard
281
+ - task:
282
+ type: text-generation
283
+ name: Text Generation
284
+ dataset:
285
+ name: FaQuAD NLI
286
+ type: ruanchaves/faquad-nli
287
+ split: test
288
+ args:
289
+ num_few_shot: 15
290
+ metrics:
291
+ - type: f1_macro
292
+ value: 83.79
293
+ name: f1-macro
294
+ source:
295
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Weyaxi/Bagel-Hermes-34B-Slerp
296
+ name: Open Portuguese LLM Leaderboard
297
+ - task:
298
+ type: text-generation
299
+ name: Text Generation
300
+ dataset:
301
+ name: HateBR Binary
302
+ type: ruanchaves/hatebr
303
+ split: test
304
+ args:
305
+ num_few_shot: 25
306
+ metrics:
307
+ - type: f1_macro
308
+ value: 78.14
309
+ name: f1-macro
310
+ source:
311
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Weyaxi/Bagel-Hermes-34B-Slerp
312
+ name: Open Portuguese LLM Leaderboard
313
+ - task:
314
+ type: text-generation
315
+ name: Text Generation
316
+ dataset:
317
+ name: PT Hate Speech Binary
318
+ type: hate_speech_portuguese
319
+ split: test
320
+ args:
321
+ num_few_shot: 25
322
+ metrics:
323
+ - type: f1_macro
324
+ value: 71.3
325
+ name: f1-macro
326
+ source:
327
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Weyaxi/Bagel-Hermes-34B-Slerp
328
+ name: Open Portuguese LLM Leaderboard
329
+ - task:
330
+ type: text-generation
331
+ name: Text Generation
332
+ dataset:
333
+ name: tweetSentBR
334
+ type: eduagarcia/tweetsentbr_fewshot
335
+ split: test
336
+ args:
337
+ num_few_shot: 25
338
+ metrics:
339
+ - type: f1_macro
340
+ value: 72.18
341
+ name: f1-macro
342
+ source:
343
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Weyaxi/Bagel-Hermes-34B-Slerp
344
+ name: Open Portuguese LLM Leaderboard
345
  ---
346
  # Bagel-Hermes-34B-Slerp
347
 
 
414
  |MuSR (0-shot) |17.01|
415
  |MMLU-PRO (5-shot) |41.15|
416
 
417
+
418
+ # Open Portuguese LLM Leaderboard Evaluation Results
419
+
420
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/Weyaxi/Bagel-Hermes-34B-Slerp) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
421
+
422
+ | Metric | Value |
423
+ |--------------------------|--------|
424
+ |Average |**75.2**|
425
+ |ENEM Challenge (No Images)| 74.32|
426
+ |BLUEX (No Images) | 67.59|
427
+ |OAB Exams | 55.13|
428
+ |Assin2 RTE | 92.16|
429
+ |Assin2 STS | 82.23|
430
+ |FaQuAD NLI | 83.79|
431
+ |HateBR Binary | 78.14|
432
+ |PT Hate Speech Binary | 71.30|
433
+ |tweetSentBR | 72.18|
434
+