JingweiZuo DhiyaEddine commited on
Commit
da37de5
1 Parent(s): 3abb65b

Update README.md (#8)

Browse files

- Update README.md (5535b528b9f3d9bc7a2f0e229e607b0ae533a7d2)


Co-authored-by: Rhaiem <DhiyaEddine@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -105,7 +105,7 @@ We report in the following table our internal pipeline benchmarks. For the bench
105
  <tr>
106
  <td rowspan="3">General</td>
107
  <td>MMLU (5-shot)</td>
108
- <td>-</td>
109
  <td>68.7%</td>
110
  <td>55.9%</td>
111
  <td>65.3%</td>
@@ -127,14 +127,14 @@ We report in the following table our internal pipeline benchmarks. For the bench
127
  <tr>
128
  <td rowspan="2">Math</td>
129
  <td>GSM8K (5-shot)</td>
130
- <td>-</td>
131
  <td>74.9%</td>
132
  <td>19.2%</td>
133
  <td>65.2%</td>
134
  </tr>
135
  <tr>
136
  <td>MATH Lvl-5 (4-shot)</td>
137
- <td>-</td>
138
  <td>6.9%</td>
139
  <td>10.4%</td>
140
  <td>27.3%</td>
@@ -142,7 +142,7 @@ We report in the following table our internal pipeline benchmarks. For the bench
142
  <tr>
143
  <td rowspan="4">Reasoning</td>
144
  <td>Arc Challenge (25-shot)</td>
145
- <td>-</td>
146
  <td>54.3%</td>
147
  <td>46.6%</td>
148
  <td>53.7%</td>
@@ -171,28 +171,28 @@ We report in the following table our internal pipeline benchmarks. For the bench
171
  <tr>
172
  <td rowspan="4">CommonSense Understanding</td>
173
  <td>PIQA (0-shot)</td>
174
- <td>-</td>
175
  <td>82.3%</td>
176
  <td>78.9%</td>
177
  <td>80.9%</td>
178
  </tr>
179
  <tr>
180
  <td>SciQ (0-shot)</td>
181
- <td>-</td>
182
  <td>94.9%</td>
183
  <td>80.2%</td>
184
  <td>93.6%</td>
185
  </tr>
186
  <tr>
187
  <td>Winogrande (0-shot)</td>
188
- <td>-</td>
189
  <td>64.5%</td>
190
  <td>-</td>
191
  <td>-</td>
192
  </tr>
193
  <tr>
194
  <td>OpenbookQA (0-shot)</td>
195
- <td>-</td>
196
  <td>34.6%</td>
197
  <td>46.2%</td>
198
  <td>47.2%</td>
 
105
  <tr>
106
  <td rowspan="3">General</td>
107
  <td>MMLU (5-shot)</td>
108
+ <td>30.6%</td>
109
  <td>68.7%</td>
110
  <td>55.9%</td>
111
  <td>65.3%</td>
 
127
  <tr>
128
  <td rowspan="2">Math</td>
129
  <td>GSM8K (5-shot)</td>
130
+ <td>0%</td>
131
  <td>74.9%</td>
132
  <td>19.2%</td>
133
  <td>65.2%</td>
134
  </tr>
135
  <tr>
136
  <td>MATH Lvl-5 (4-shot)</td>
137
+ <td>13.6%</td>
138
  <td>6.9%</td>
139
  <td>10.4%</td>
140
  <td>27.3%</td>
 
142
  <tr>
143
  <td rowspan="4">Reasoning</td>
144
  <td>Arc Challenge (25-shot)</td>
145
+ <td>54%</td>
146
  <td>54.3%</td>
147
  <td>46.6%</td>
148
  <td>53.7%</td>
 
171
  <tr>
172
  <td rowspan="4">CommonSense Understanding</td>
173
  <td>PIQA (0-shot)</td>
174
+ <td>75.6%</td>
175
  <td>82.3%</td>
176
  <td>78.9%</td>
177
  <td>80.9%</td>
178
  </tr>
179
  <tr>
180
  <td>SciQ (0-shot)</td>
181
+ <td>29.2%</td>
182
  <td>94.9%</td>
183
  <td>80.2%</td>
184
  <td>93.6%</td>
185
  </tr>
186
  <tr>
187
  <td>Winogrande (0-shot)</td>
188
+ <td>75.9%</td>
189
  <td>64.5%</td>
190
  <td>-</td>
191
  <td>-</td>
192
  </tr>
193
  <tr>
194
  <td>OpenbookQA (0-shot)</td>
195
+ <td>45.6%</td>
196
  <td>34.6%</td>
197
  <td>46.2%</td>
198
  <td>47.2%</td>