leaderboard-pr-bot commited on
Commit
d0fa7cd
1 Parent(s): bd17453

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +112 -14
README.md CHANGED
@@ -16,7 +16,6 @@ tags:
16
  base_model: meta-llama/Meta-Llama-3-70B-Instruct
17
  datasets:
18
  - MaziyarPanahi/truthy-dpo-v0.1-axolotl
19
- model_name: calme-2.3-llama3-70b
20
  pipeline_tag: text-generation
21
  license_name: llama3
22
  license_link: LICENSE
@@ -41,8 +40,7 @@ model-index:
41
  value: 72.35
42
  name: normalized accuracy
43
  source:
44
- url: >-
45
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/calme-2.3-llama3-70b
46
  name: Open LLM Leaderboard
47
  - task:
48
  type: text-generation
@@ -58,8 +56,7 @@ model-index:
58
  value: 86
59
  name: normalized accuracy
60
  source:
61
- url: >-
62
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/calme-2.3-llama3-70b
63
  name: Open LLM Leaderboard
64
  - task:
65
  type: text-generation
@@ -76,8 +73,7 @@ model-index:
76
  value: 80.47
77
  name: accuracy
78
  source:
79
- url: >-
80
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/calme-2.3-llama3-70b
81
  name: Open LLM Leaderboard
82
  - task:
83
  type: text-generation
@@ -93,8 +89,7 @@ model-index:
93
  - type: mc2
94
  value: 63.45
95
  source:
96
- url: >-
97
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/calme-2.3-llama3-70b
98
  name: Open LLM Leaderboard
99
  - task:
100
  type: text-generation
@@ -111,8 +106,7 @@ model-index:
111
  value: 82.95
112
  name: accuracy
113
  source:
114
- url: >-
115
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/calme-2.3-llama3-70b
116
  name: Open LLM Leaderboard
117
  - task:
118
  type: text-generation
@@ -129,8 +123,99 @@ model-index:
129
  value: 87.19
130
  name: accuracy
131
  source:
132
- url: >-
133
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/calme-2.3-llama3-70b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
134
  name: Open LLM Leaderboard
135
  ---
136
 
@@ -239,4 +324,17 @@ outputs = pipeline(
239
  top_p=0.95,
240
  )
241
  print(outputs[0]["generated_text"][len(prompt):])
242
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  base_model: meta-llama/Meta-Llama-3-70B-Instruct
17
  datasets:
18
  - MaziyarPanahi/truthy-dpo-v0.1-axolotl
 
19
  pipeline_tag: text-generation
20
  license_name: llama3
21
  license_link: LICENSE
 
40
  value: 72.35
41
  name: normalized accuracy
42
  source:
43
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/calme-2.3-llama3-70b
 
44
  name: Open LLM Leaderboard
45
  - task:
46
  type: text-generation
 
56
  value: 86
57
  name: normalized accuracy
58
  source:
59
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/calme-2.3-llama3-70b
 
60
  name: Open LLM Leaderboard
61
  - task:
62
  type: text-generation
 
73
  value: 80.47
74
  name: accuracy
75
  source:
76
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/calme-2.3-llama3-70b
 
77
  name: Open LLM Leaderboard
78
  - task:
79
  type: text-generation
 
89
  - type: mc2
90
  value: 63.45
91
  source:
92
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/calme-2.3-llama3-70b
 
93
  name: Open LLM Leaderboard
94
  - task:
95
  type: text-generation
 
106
  value: 82.95
107
  name: accuracy
108
  source:
109
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/calme-2.3-llama3-70b
 
110
  name: Open LLM Leaderboard
111
  - task:
112
  type: text-generation
 
123
  value: 87.19
124
  name: accuracy
125
  source:
126
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/calme-2.3-llama3-70b
127
+ name: Open LLM Leaderboard
128
+ - task:
129
+ type: text-generation
130
+ name: Text Generation
131
+ dataset:
132
+ name: IFEval (0-Shot)
133
+ type: HuggingFaceH4/ifeval
134
+ args:
135
+ num_few_shot: 0
136
+ metrics:
137
+ - type: inst_level_strict_acc and prompt_level_strict_acc
138
+ value: 80.1
139
+ name: strict accuracy
140
+ source:
141
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-2.3-llama3-70b
142
+ name: Open LLM Leaderboard
143
+ - task:
144
+ type: text-generation
145
+ name: Text Generation
146
+ dataset:
147
+ name: BBH (3-Shot)
148
+ type: BBH
149
+ args:
150
+ num_few_shot: 3
151
+ metrics:
152
+ - type: acc_norm
153
+ value: 48.01
154
+ name: normalized accuracy
155
+ source:
156
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-2.3-llama3-70b
157
+ name: Open LLM Leaderboard
158
+ - task:
159
+ type: text-generation
160
+ name: Text Generation
161
+ dataset:
162
+ name: MATH Lvl 5 (4-Shot)
163
+ type: hendrycks/competition_math
164
+ args:
165
+ num_few_shot: 4
166
+ metrics:
167
+ - type: exact_match
168
+ value: 21.9
169
+ name: exact match
170
+ source:
171
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-2.3-llama3-70b
172
+ name: Open LLM Leaderboard
173
+ - task:
174
+ type: text-generation
175
+ name: Text Generation
176
+ dataset:
177
+ name: GPQA (0-shot)
178
+ type: Idavidrein/gpqa
179
+ args:
180
+ num_few_shot: 0
181
+ metrics:
182
+ - type: acc_norm
183
+ value: 11.74
184
+ name: acc_norm
185
+ source:
186
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-2.3-llama3-70b
187
+ name: Open LLM Leaderboard
188
+ - task:
189
+ type: text-generation
190
+ name: Text Generation
191
+ dataset:
192
+ name: MuSR (0-shot)
193
+ type: TAUR-Lab/MuSR
194
+ args:
195
+ num_few_shot: 0
196
+ metrics:
197
+ - type: acc_norm
198
+ value: 12.57
199
+ name: acc_norm
200
+ source:
201
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-2.3-llama3-70b
202
+ name: Open LLM Leaderboard
203
+ - task:
204
+ type: text-generation
205
+ name: Text Generation
206
+ dataset:
207
+ name: MMLU-PRO (5-shot)
208
+ type: TIGER-Lab/MMLU-Pro
209
+ config: main
210
+ split: test
211
+ args:
212
+ num_few_shot: 5
213
+ metrics:
214
+ - type: acc
215
+ value: 46.72
216
+ name: accuracy
217
+ source:
218
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-2.3-llama3-70b
219
  name: Open LLM Leaderboard
220
  ---
221
 
 
324
  top_p=0.95,
325
  )
326
  print(outputs[0]["generated_text"][len(prompt):])
327
+ ```
328
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
329
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__calme-2.3-llama3-70b)
330
+
331
+ | Metric |Value|
332
+ |-------------------|----:|
333
+ |Avg. |36.84|
334
+ |IFEval (0-Shot) |80.10|
335
+ |BBH (3-Shot) |48.01|
336
+ |MATH Lvl 5 (4-Shot)|21.90|
337
+ |GPQA (0-shot) |11.74|
338
+ |MuSR (0-shot) |12.57|
339
+ |MMLU-PRO (5-shot) |46.72|
340
+