Lin-K76 commited on
Commit
e32ba06
1 Parent(s): f18b9c8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -21
README.md CHANGED
@@ -138,6 +138,7 @@ lm_eval \
138
  --tasks openllm \
139
  --batch_size auto
140
  ```
 
141
 
142
  ### Accuracy
143
 
@@ -156,71 +157,71 @@ lm_eval \
156
  <tr>
157
  <td>MMLU (5-shot)
158
  </td>
159
- <td>82.21
160
  </td>
161
- <td>82.13
162
  </td>
163
- <td>99.90%
164
  </td>
165
  </tr>
166
  <tr>
167
  <td>ARC Challenge (25-shot)
168
  </td>
169
- <td>70.65
170
  </td>
171
- <td>70.31
172
  </td>
173
- <td>99.52%
174
  </td>
175
  </tr>
176
  <tr>
177
  <td>GSM-8K (5-shot, strict-match)
178
  </td>
179
- <td>87.95
180
  </td>
181
- <td>88.40
182
  </td>
183
- <td>100.5%
184
  </td>
185
  </tr>
186
  <tr>
187
  <td>Hellaswag (10-shot)
188
  </td>
189
- <td>86.33
190
  </td>
191
- <td>86.27
192
  </td>
193
- <td>99.93%
194
  </td>
195
  </tr>
196
  <tr>
197
  <td>Winogrande (5-shot)
198
  </td>
199
- <td>85.00
200
  </td>
201
- <td>85.00
202
  </td>
203
- <td>100.0%
204
  </td>
205
  </tr>
206
  <tr>
207
  <td>TruthfulQA (0-shot)
208
  </td>
209
- <td>59.90
210
  </td>
211
- <td>60.01
212
  </td>
213
- <td>100.1%
214
  </td>
215
  </tr>
216
  <tr>
217
  <td><strong>Average</strong>
218
  </td>
219
- <td><strong>78.67</strong>
220
  </td>
221
- <td><strong>78.69</strong>
222
  </td>
223
- <td><strong>100.0%</strong>
224
  </td>
225
  </tr>
226
  </table>
 
138
  --tasks openllm \
139
  --batch_size auto
140
  ```
141
+ Certain benchmarks for the full precision model are still being acquired. Average recovery is calculated only with metrics that both models have been evaluated on.
142
 
143
  ### Accuracy
144
 
 
157
  <tr>
158
  <td>MMLU (5-shot)
159
  </td>
160
+ <td>*
161
  </td>
162
+ <td>88.34
163
  </td>
164
+ <td>*
165
  </td>
166
  </tr>
167
  <tr>
168
  <td>ARC Challenge (25-shot)
169
  </td>
170
+ <td>73.38
171
  </td>
172
+ <td>72.61
173
  </td>
174
+ <td>98.95%
175
  </td>
176
  </tr>
177
  <tr>
178
  <td>GSM-8K (5-shot, strict-match)
179
  </td>
180
+ <td>95.07
181
  </td>
182
+ <td>95.00
183
  </td>
184
+ <td>99.93%
185
  </td>
186
  </tr>
187
  <tr>
188
  <td>Hellaswag (10-shot)
189
  </td>
190
+ <td>*
191
  </td>
192
+ <td>88.34
193
  </td>
194
+ <td>*
195
  </td>
196
  </tr>
197
  <tr>
198
  <td>Winogrande (5-shot)
199
  </td>
200
+ <td>87.21
201
  </td>
202
+ <td>87.45
203
  </td>
204
+ <td>100.2%
205
  </td>
206
  </tr>
207
  <tr>
208
  <td>TruthfulQA (0-shot)
209
  </td>
210
+ <td>*
211
  </td>
212
+ <td>64.71
213
  </td>
214
+ <td>*
215
  </td>
216
  </tr>
217
  <tr>
218
  <td><strong>Average</strong>
219
  </td>
220
+ <td><strong>*</strong>
221
  </td>
222
+ <td><strong>82.38</strong>
223
  </td>
224
+ <td><strong>98.95%</strong>
225
  </td>
226
  </tr>
227
  </table>