leaderboard-pt-pr-bot commited on
Commit
7c82b7b
1 Parent(s): e24cd6a

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the Open Portuguese LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +167 -1
README.md CHANGED
@@ -1,7 +1,154 @@
1
  ---
2
- license: cc
3
  language:
4
  - pt
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ---
6
 
7
  Llama 3 8b Instruct finetuned with Cabra 30k.
@@ -24,3 +171,22 @@ Evals
24
  |oab_exams |acc |0.5062 |0.0062|0.4911 |0.0062|
25
  |portuguese_hate_speech_binary|f1_macro|0.5982 |0.0120|0.5954 |0.0120|
26
  | |acc |0.5993 |0.0119|0.5993 |0.0119|
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - pt
4
+ license: cc
5
+ model-index:
6
+ - name: CabraLlama3-8b
7
+ results:
8
+ - task:
9
+ type: text-generation
10
+ name: Text Generation
11
+ dataset:
12
+ name: ENEM Challenge (No Images)
13
+ type: eduagarcia/enem_challenge
14
+ split: train
15
+ args:
16
+ num_few_shot: 3
17
+ metrics:
18
+ - type: acc
19
+ value: 74.67
20
+ name: accuracy
21
+ source:
22
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraLlama3-8b
23
+ name: Open Portuguese LLM Leaderboard
24
+ - task:
25
+ type: text-generation
26
+ name: Text Generation
27
+ dataset:
28
+ name: BLUEX (No Images)
29
+ type: eduagarcia-temp/BLUEX_without_images
30
+ split: train
31
+ args:
32
+ num_few_shot: 3
33
+ metrics:
34
+ - type: acc
35
+ value: 56.88
36
+ name: accuracy
37
+ source:
38
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraLlama3-8b
39
+ name: Open Portuguese LLM Leaderboard
40
+ - task:
41
+ type: text-generation
42
+ name: Text Generation
43
+ dataset:
44
+ name: OAB Exams
45
+ type: eduagarcia/oab_exams
46
+ split: train
47
+ args:
48
+ num_few_shot: 3
49
+ metrics:
50
+ - type: acc
51
+ value: 49.29
52
+ name: accuracy
53
+ source:
54
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraLlama3-8b
55
+ name: Open Portuguese LLM Leaderboard
56
+ - task:
57
+ type: text-generation
58
+ name: Text Generation
59
+ dataset:
60
+ name: Assin2 RTE
61
+ type: assin2
62
+ split: test
63
+ args:
64
+ num_few_shot: 15
65
+ metrics:
66
+ - type: f1_macro
67
+ value: 90.44
68
+ name: f1-macro
69
+ source:
70
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraLlama3-8b
71
+ name: Open Portuguese LLM Leaderboard
72
+ - task:
73
+ type: text-generation
74
+ name: Text Generation
75
+ dataset:
76
+ name: Assin2 STS
77
+ type: eduagarcia/portuguese_benchmark
78
+ split: test
79
+ args:
80
+ num_few_shot: 15
81
+ metrics:
82
+ - type: pearson
83
+ value: 69.85
84
+ name: pearson
85
+ source:
86
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraLlama3-8b
87
+ name: Open Portuguese LLM Leaderboard
88
+ - task:
89
+ type: text-generation
90
+ name: Text Generation
91
+ dataset:
92
+ name: FaQuAD NLI
93
+ type: ruanchaves/faquad-nli
94
+ split: test
95
+ args:
96
+ num_few_shot: 15
97
+ metrics:
98
+ - type: f1_macro
99
+ value: 70.38
100
+ name: f1-macro
101
+ source:
102
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraLlama3-8b
103
+ name: Open Portuguese LLM Leaderboard
104
+ - task:
105
+ type: text-generation
106
+ name: Text Generation
107
+ dataset:
108
+ name: HateBR Binary
109
+ type: ruanchaves/hatebr
110
+ split: test
111
+ args:
112
+ num_few_shot: 25
113
+ metrics:
114
+ - type: f1_macro
115
+ value: 85.05
116
+ name: f1-macro
117
+ source:
118
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraLlama3-8b
119
+ name: Open Portuguese LLM Leaderboard
120
+ - task:
121
+ type: text-generation
122
+ name: Text Generation
123
+ dataset:
124
+ name: PT Hate Speech Binary
125
+ type: hate_speech_portuguese
126
+ split: test
127
+ args:
128
+ num_few_shot: 25
129
+ metrics:
130
+ - type: f1_macro
131
+ value: 60.1
132
+ name: f1-macro
133
+ source:
134
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraLlama3-8b
135
+ name: Open Portuguese LLM Leaderboard
136
+ - task:
137
+ type: text-generation
138
+ name: Text Generation
139
+ dataset:
140
+ name: tweetSentBR
141
+ type: eduagarcia/tweetsentbr_fewshot
142
+ split: test
143
+ args:
144
+ num_few_shot: 25
145
+ metrics:
146
+ - type: f1_macro
147
+ value: 68.08
148
+ name: f1-macro
149
+ source:
150
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraLlama3-8b
151
+ name: Open Portuguese LLM Leaderboard
152
  ---
153
 
154
  Llama 3 8b Instruct finetuned with Cabra 30k.
 
171
  |oab_exams |acc |0.5062 |0.0062|0.4911 |0.0062|
172
  |portuguese_hate_speech_binary|f1_macro|0.5982 |0.0120|0.5954 |0.0120|
173
  | |acc |0.5993 |0.0119|0.5993 |0.0119|
174
+
175
+
176
+ # Open Portuguese LLM Leaderboard Evaluation Results
177
+
178
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/botbot-ai/CabraLlama3-8b) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
179
+
180
+ | Metric | Value |
181
+ |--------------------------|---------|
182
+ |Average |**69.42**|
183
+ |ENEM Challenge (No Images)| 74.67|
184
+ |BLUEX (No Images) | 56.88|
185
+ |OAB Exams | 49.29|
186
+ |Assin2 RTE | 90.44|
187
+ |Assin2 STS | 69.85|
188
+ |FaQuAD NLI | 70.38|
189
+ |HateBR Binary | 85.05|
190
+ |PT Hate Speech Binary | 60.10|
191
+ |tweetSentBR | 68.08|
192
+