leaderboard-pt-pr-bot commited on
Commit
4a2d177
1 Parent(s): bfd010d

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard) to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +167 -1
README.md CHANGED
@@ -1,6 +1,153 @@
1
  ---
2
  library_name: transformers
3
  tags: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
  # Model Card for Model ID
@@ -196,4 +343,23 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
196
 
197
  ## Model Card Contact
198
 
199
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: transformers
3
  tags: []
4
+ model-index:
5
+ - name: Qwen1.5-32B-Dolphin-Portuguese-v0.1
6
+ results:
7
+ - task:
8
+ type: text-generation
9
+ name: Text Generation
10
+ dataset:
11
+ name: ENEM Challenge (No Images)
12
+ type: eduagarcia/enem_challenge
13
+ split: train
14
+ args:
15
+ num_few_shot: 3
16
+ metrics:
17
+ - type: acc
18
+ value: 74.74
19
+ name: accuracy
20
+ source:
21
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Qwen1.5-32B-Dolphin-Portuguese-v0.1
22
+ name: Open Portuguese LLM Leaderboard
23
+ - task:
24
+ type: text-generation
25
+ name: Text Generation
26
+ dataset:
27
+ name: BLUEX (No Images)
28
+ type: eduagarcia-temp/BLUEX_without_images
29
+ split: train
30
+ args:
31
+ num_few_shot: 3
32
+ metrics:
33
+ - type: acc
34
+ value: 66.34
35
+ name: accuracy
36
+ source:
37
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Qwen1.5-32B-Dolphin-Portuguese-v0.1
38
+ name: Open Portuguese LLM Leaderboard
39
+ - task:
40
+ type: text-generation
41
+ name: Text Generation
42
+ dataset:
43
+ name: OAB Exams
44
+ type: eduagarcia/oab_exams
45
+ split: train
46
+ args:
47
+ num_few_shot: 3
48
+ metrics:
49
+ - type: acc
50
+ value: 53.71
51
+ name: accuracy
52
+ source:
53
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Qwen1.5-32B-Dolphin-Portuguese-v0.1
54
+ name: Open Portuguese LLM Leaderboard
55
+ - task:
56
+ type: text-generation
57
+ name: Text Generation
58
+ dataset:
59
+ name: Assin2 RTE
60
+ type: assin2
61
+ split: test
62
+ args:
63
+ num_few_shot: 15
64
+ metrics:
65
+ - type: f1_macro
66
+ value: 93.66
67
+ name: f1-macro
68
+ source:
69
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Qwen1.5-32B-Dolphin-Portuguese-v0.1
70
+ name: Open Portuguese LLM Leaderboard
71
+ - task:
72
+ type: text-generation
73
+ name: Text Generation
74
+ dataset:
75
+ name: Assin2 STS
76
+ type: eduagarcia/portuguese_benchmark
77
+ split: test
78
+ args:
79
+ num_few_shot: 15
80
+ metrics:
81
+ - type: pearson
82
+ value: 77.7
83
+ name: pearson
84
+ source:
85
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Qwen1.5-32B-Dolphin-Portuguese-v0.1
86
+ name: Open Portuguese LLM Leaderboard
87
+ - task:
88
+ type: text-generation
89
+ name: Text Generation
90
+ dataset:
91
+ name: FaQuAD NLI
92
+ type: ruanchaves/faquad-nli
93
+ split: test
94
+ args:
95
+ num_few_shot: 15
96
+ metrics:
97
+ - type: f1_macro
98
+ value: 82.14
99
+ name: f1-macro
100
+ source:
101
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Qwen1.5-32B-Dolphin-Portuguese-v0.1
102
+ name: Open Portuguese LLM Leaderboard
103
+ - task:
104
+ type: text-generation
105
+ name: Text Generation
106
+ dataset:
107
+ name: HateBR Binary
108
+ type: ruanchaves/hatebr
109
+ split: test
110
+ args:
111
+ num_few_shot: 25
112
+ metrics:
113
+ - type: f1_macro
114
+ value: 86.71
115
+ name: f1-macro
116
+ source:
117
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Qwen1.5-32B-Dolphin-Portuguese-v0.1
118
+ name: Open Portuguese LLM Leaderboard
119
+ - task:
120
+ type: text-generation
121
+ name: Text Generation
122
+ dataset:
123
+ name: PT Hate Speech Binary
124
+ type: hate_speech_portuguese
125
+ split: test
126
+ args:
127
+ num_few_shot: 25
128
+ metrics:
129
+ - type: f1_macro
130
+ value: 68.68
131
+ name: f1-macro
132
+ source:
133
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Qwen1.5-32B-Dolphin-Portuguese-v0.1
134
+ name: Open Portuguese LLM Leaderboard
135
+ - task:
136
+ type: text-generation
137
+ name: Text Generation
138
+ dataset:
139
+ name: tweetSentBR
140
+ type: eduagarcia/tweetsentbr_fewshot
141
+ split: test
142
+ args:
143
+ num_few_shot: 25
144
+ metrics:
145
+ - type: f1_macro
146
+ value: 72.82
147
+ name: f1-macro
148
+ source:
149
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Qwen1.5-32B-Dolphin-Portuguese-v0.1
150
+ name: Open Portuguese LLM Leaderboard
151
  ---
152
 
153
  # Model Card for Model ID
 
343
 
344
  ## Model Card Contact
345
 
346
+ [More Information Needed]
347
+
348
+
349
+ # Open Portuguese LLM Leaderboard Evaluation Results
350
+
351
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/adalbertojunior/Qwen1.5-32B-Dolphin-Portuguese-v0.1) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
352
+
353
+ | Metric | Value |
354
+ |--------------------------|---------|
355
+ |Average |**75.17**|
356
+ |ENEM Challenge (No Images)| 74.74|
357
+ |BLUEX (No Images) | 66.34|
358
+ |OAB Exams | 53.71|
359
+ |Assin2 RTE | 93.66|
360
+ |Assin2 STS | 77.70|
361
+ |FaQuAD NLI | 82.14|
362
+ |HateBR Binary | 86.71|
363
+ |PT Hate Speech Binary | 68.68|
364
+ |tweetSentBR | 72.82|
365
+