leaderboard-pt-pr-bot commited on
Commit
26634ac
•
1 Parent(s): 661fa9d

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard) to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +162 -0
README.md CHANGED
@@ -95,6 +95,150 @@ model-index:
95
  source:
96
  url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=alpindale/WizardLM-2-8x22B
97
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98
  ---
99
 
100
 
@@ -214,3 +358,21 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
214
  |MuSR (0-shot) |14.54|
215
  |MMLU-PRO (5-shot) |39.96|
216
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
  source:
96
  url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=alpindale/WizardLM-2-8x22B
97
  name: Open LLM Leaderboard
98
+ - task:
99
+ type: text-generation
100
+ name: Text Generation
101
+ dataset:
102
+ name: ENEM Challenge (No Images)
103
+ type: eduagarcia/enem_challenge
104
+ split: train
105
+ args:
106
+ num_few_shot: 3
107
+ metrics:
108
+ - type: acc
109
+ value: 75.86
110
+ name: accuracy
111
+ source:
112
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=alpindale/WizardLM-2-8x22B
113
+ name: Open Portuguese LLM Leaderboard
114
+ - task:
115
+ type: text-generation
116
+ name: Text Generation
117
+ dataset:
118
+ name: BLUEX (No Images)
119
+ type: eduagarcia-temp/BLUEX_without_images
120
+ split: train
121
+ args:
122
+ num_few_shot: 3
123
+ metrics:
124
+ - type: acc
125
+ value: 65.79
126
+ name: accuracy
127
+ source:
128
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=alpindale/WizardLM-2-8x22B
129
+ name: Open Portuguese LLM Leaderboard
130
+ - task:
131
+ type: text-generation
132
+ name: Text Generation
133
+ dataset:
134
+ name: OAB Exams
135
+ type: eduagarcia/oab_exams
136
+ split: train
137
+ args:
138
+ num_few_shot: 3
139
+ metrics:
140
+ - type: acc
141
+ value: 56.45
142
+ name: accuracy
143
+ source:
144
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=alpindale/WizardLM-2-8x22B
145
+ name: Open Portuguese LLM Leaderboard
146
+ - task:
147
+ type: text-generation
148
+ name: Text Generation
149
+ dataset:
150
+ name: Assin2 RTE
151
+ type: assin2
152
+ split: test
153
+ args:
154
+ num_few_shot: 15
155
+ metrics:
156
+ - type: f1_macro
157
+ value: 94.12
158
+ name: f1-macro
159
+ source:
160
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=alpindale/WizardLM-2-8x22B
161
+ name: Open Portuguese LLM Leaderboard
162
+ - task:
163
+ type: text-generation
164
+ name: Text Generation
165
+ dataset:
166
+ name: Assin2 STS
167
+ type: eduagarcia/portuguese_benchmark
168
+ split: test
169
+ args:
170
+ num_few_shot: 15
171
+ metrics:
172
+ - type: pearson
173
+ value: 79.32
174
+ name: pearson
175
+ source:
176
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=alpindale/WizardLM-2-8x22B
177
+ name: Open Portuguese LLM Leaderboard
178
+ - task:
179
+ type: text-generation
180
+ name: Text Generation
181
+ dataset:
182
+ name: FaQuAD NLI
183
+ type: ruanchaves/faquad-nli
184
+ split: test
185
+ args:
186
+ num_few_shot: 15
187
+ metrics:
188
+ - type: f1_macro
189
+ value: 76.97
190
+ name: f1-macro
191
+ source:
192
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=alpindale/WizardLM-2-8x22B
193
+ name: Open Portuguese LLM Leaderboard
194
+ - task:
195
+ type: text-generation
196
+ name: Text Generation
197
+ dataset:
198
+ name: HateBR Binary
199
+ type: ruanchaves/hatebr
200
+ split: test
201
+ args:
202
+ num_few_shot: 25
203
+ metrics:
204
+ - type: f1_macro
205
+ value: 82.06
206
+ name: f1-macro
207
+ source:
208
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=alpindale/WizardLM-2-8x22B
209
+ name: Open Portuguese LLM Leaderboard
210
+ - task:
211
+ type: text-generation
212
+ name: Text Generation
213
+ dataset:
214
+ name: PT Hate Speech Binary
215
+ type: hate_speech_portuguese
216
+ split: test
217
+ args:
218
+ num_few_shot: 25
219
+ metrics:
220
+ - type: f1_macro
221
+ value: 72.88
222
+ name: f1-macro
223
+ source:
224
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=alpindale/WizardLM-2-8x22B
225
+ name: Open Portuguese LLM Leaderboard
226
+ - task:
227
+ type: text-generation
228
+ name: Text Generation
229
+ dataset:
230
+ name: tweetSentBR
231
+ type: eduagarcia/tweetsentbr_fewshot
232
+ split: test
233
+ args:
234
+ num_few_shot: 25
235
+ metrics:
236
+ - type: f1_macro
237
+ value: 72.59
238
+ name: f1-macro
239
+ source:
240
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=alpindale/WizardLM-2-8x22B
241
+ name: Open Portuguese LLM Leaderboard
242
  ---
243
 
244
 
 
358
  |MuSR (0-shot) |14.54|
359
  |MMLU-PRO (5-shot) |39.96|
360
 
361
+
362
+ # Open Portuguese LLM Leaderboard Evaluation Results
363
+
364
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/alpindale/WizardLM-2-8x22B) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
365
+
366
+ | Metric | Value |
367
+ |--------------------------|---------|
368
+ |Average |**75.11**|
369
+ |ENEM Challenge (No Images)| 75.86|
370
+ |BLUEX (No Images) | 65.79|
371
+ |OAB Exams | 56.45|
372
+ |Assin2 RTE | 94.12|
373
+ |Assin2 STS | 79.32|
374
+ |FaQuAD NLI | 76.97|
375
+ |HateBR Binary | 82.06|
376
+ |PT Hate Speech Binary | 72.88|
377
+ |tweetSentBR | 72.59|
378
+