leaderboard-pr-bot commited on
Commit
f45ca1b
1 Parent(s): aa176c0

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +106 -0
README.md CHANGED
@@ -105,6 +105,98 @@ model-index:
105
  source:
106
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3
107
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
108
  ---
109
 
110
 
@@ -156,3 +248,17 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
156
  |Winogrande (5-shot) |79.40|
157
  |GSM8k (5-shot) |72.93|
158
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
105
  source:
106
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3
107
  name: Open LLM Leaderboard
108
+ - task:
109
+ type: text-generation
110
+ name: Text Generation
111
+ dataset:
112
+ name: IFEval (0-Shot)
113
+ type: HuggingFaceH4/ifeval
114
+ args:
115
+ num_few_shot: 0
116
+ metrics:
117
+ - type: inst_level_strict_acc and prompt_level_strict_acc
118
+ value: 52.76
119
+ name: strict accuracy
120
+ source:
121
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3
122
+ name: Open LLM Leaderboard
123
+ - task:
124
+ type: text-generation
125
+ name: Text Generation
126
+ dataset:
127
+ name: BBH (3-Shot)
128
+ type: BBH
129
+ args:
130
+ num_few_shot: 3
131
+ metrics:
132
+ - type: acc_norm
133
+ value: 22.39
134
+ name: normalized accuracy
135
+ source:
136
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3
137
+ name: Open LLM Leaderboard
138
+ - task:
139
+ type: text-generation
140
+ name: Text Generation
141
+ dataset:
142
+ name: MATH Lvl 5 (4-Shot)
143
+ type: hendrycks/competition_math
144
+ args:
145
+ num_few_shot: 4
146
+ metrics:
147
+ - type: exact_match
148
+ value: 3.47
149
+ name: exact match
150
+ source:
151
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3
152
+ name: Open LLM Leaderboard
153
+ - task:
154
+ type: text-generation
155
+ name: Text Generation
156
+ dataset:
157
+ name: GPQA (0-shot)
158
+ type: Idavidrein/gpqa
159
+ args:
160
+ num_few_shot: 0
161
+ metrics:
162
+ - type: acc_norm
163
+ value: 0.0
164
+ name: acc_norm
165
+ source:
166
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3
167
+ name: Open LLM Leaderboard
168
+ - task:
169
+ type: text-generation
170
+ name: Text Generation
171
+ dataset:
172
+ name: MuSR (0-shot)
173
+ type: TAUR-Lab/MuSR
174
+ args:
175
+ num_few_shot: 0
176
+ metrics:
177
+ - type: acc_norm
178
+ value: 3.65
179
+ name: acc_norm
180
+ source:
181
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3
182
+ name: Open LLM Leaderboard
183
+ - task:
184
+ type: text-generation
185
+ name: Text Generation
186
+ dataset:
187
+ name: MMLU-PRO (5-shot)
188
+ type: TIGER-Lab/MMLU-Pro
189
+ config: main
190
+ split: test
191
+ args:
192
+ num_few_shot: 5
193
+ metrics:
194
+ - type: acc
195
+ value: 22.85
196
+ name: accuracy
197
+ source:
198
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3
199
+ name: Open LLM Leaderboard
200
  ---
201
 
202
 
 
248
  |Winogrande (5-shot) |79.40|
249
  |GSM8k (5-shot) |72.93|
250
 
251
+
252
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
253
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Kukedlc__NeuralLLaMa-3-8b-ORPO-v0.3)
254
+
255
+ | Metric |Value|
256
+ |-------------------|----:|
257
+ |Avg. |17.52|
258
+ |IFEval (0-Shot) |52.76|
259
+ |BBH (3-Shot) |22.39|
260
+ |MATH Lvl 5 (4-Shot)| 3.47|
261
+ |GPQA (0-shot) | 0.00|
262
+ |MuSR (0-shot) | 3.65|
263
+ |MMLU-PRO (5-shot) |22.85|
264
+