ybelkada DhiyaEddine commited on
Commit
7ce48d8
1 Parent(s): 8c5250f

Update README.md (#4)

Browse files

- Update README.md (0eed7e57ce365a868257ce04d93d9276fd6703d8)


Co-authored-by: Rhaiem <DhiyaEddine@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -116,7 +116,7 @@ We report in the following table our internal pipeline benchmarks:
116
  <td>MMLU-PRO (5-shot)</td>
117
  <td>32.4%</td>
118
  <td>31.6%</td>
119
- <td>-</td>
120
  <td>29.6%</td>
121
  <td>26.3%</td>
122
  </tr>
@@ -124,7 +124,7 @@ We report in the following table our internal pipeline benchmarks:
124
  <td>IFEval</td>
125
  <td>69.9%</td>
126
  <td>65.7%</td>
127
- <td>-</td>
128
  <td>78.6%</td>
129
  <td>71.7%</td>
130
  </tr>
@@ -141,7 +141,7 @@ We report in the following table our internal pipeline benchmarks:
141
  <td>MATH(4-shot)</td>
142
  <td>-</td>
143
  <td>6.9%</td>
144
- <td>-</td>
145
  <td>-</td>
146
  <td>27.3%</td>
147
  </tr>
@@ -158,7 +158,7 @@ We report in the following table our internal pipeline benchmarks:
158
  <td>GPQA (0-shot)</td>
159
  <td>10.3%</td>
160
  <td>11.1%</td>
161
- <td>-</td>
162
  <td>2.4%</td>
163
  <td>7.2%</td>
164
  </tr>
@@ -166,7 +166,7 @@ We report in the following table our internal pipeline benchmarks:
166
  <td>MUSR (0-shot)</td>
167
  <td>8.2%</td>
168
  <td>12.2%</td>
169
- <td>-</td>
170
  <td>8.4%</td>
171
  <td>8.3%</td>
172
  </tr>
@@ -174,7 +174,7 @@ We report in the following table our internal pipeline benchmarks:
174
  <td>BBH (3-shot)</td>
175
  <td>33.3%</td>
176
  <td>35.3%</td>
177
- <td>-</td>
178
  <td>29.9%</td>
179
  <td>25.2%</td>
180
  </tr>
 
116
  <td>MMLU-PRO (5-shot)</td>
117
  <td>32.4%</td>
118
  <td>31.6%</td>
119
+ <td>31.6%</td>
120
  <td>29.6%</td>
121
  <td>26.3%</td>
122
  </tr>
 
124
  <td>IFEval</td>
125
  <td>69.9%</td>
126
  <td>65.7%</td>
127
+ <td>56.8%</td>
128
  <td>78.6%</td>
129
  <td>71.7%</td>
130
  </tr>
 
141
  <td>MATH(4-shot)</td>
142
  <td>-</td>
143
  <td>6.9%</td>
144
+ <td>9.44%</td>
145
  <td>-</td>
146
  <td>27.3%</td>
147
  </tr>
 
158
  <td>GPQA (0-shot)</td>
159
  <td>10.3%</td>
160
  <td>11.1%</td>
161
+ <td>6.4%</td>
162
  <td>2.4%</td>
163
  <td>7.2%</td>
164
  </tr>
 
166
  <td>MUSR (0-shot)</td>
167
  <td>8.2%</td>
168
  <td>12.2%</td>
169
+ <td>7.4%</td>
170
  <td>8.4%</td>
171
  <td>8.3%</td>
172
  </tr>
 
174
  <td>BBH (3-shot)</td>
175
  <td>33.3%</td>
176
  <td>35.3%</td>
177
+ <td>37.8%</td>
178
  <td>29.9%</td>
179
  <td>25.2%</td>
180
  </tr>