ybelkada qiyang-zhao commited on
Commit
941e157
1 Parent(s): 4746ba6

Update README.md (#1)

Browse files

- Update README.md (992ffa334ad1c28e0918d8c37ca73cd3db04701e)


Co-authored-by: Qiyang Zhao <qiyang-zhao@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +55 -2
README.md CHANGED
@@ -72,8 +72,61 @@ python run_inference.py -m models/Falcon3-10B-1.58bit/ggml-model-i2_s.gguf -p "Y
72
  ```
73
 
74
  # Evaluation
75
-
76
- Coming soon ..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77
 
78
  # Citation
79
 
 
72
  ```
73
 
74
  # Evaluation
75
+ We report in the following table our internal pipeline benchmarks:
76
+
77
+ **Note evaluation results are normalized score from v2 leaderboard tasks - reported results of original models in the blogpost are raw scores**
78
+
79
+ <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
80
+ <colgroup>
81
+ <col style="width: 10%;">
82
+ <col style="width: 10%;">
83
+ <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
84
+ </colgroup>
85
+ <thead>
86
+ <tr>
87
+ <th>Benchmark</th>
88
+ <th>Llama3-8B-1.58-100B-tokens</th>
89
+ <th>Falcon3-10B-Base-1.58bit</th>
90
+ </tr>
91
+ </thead>
92
+ <tbody>
93
+ <tr>
94
+ <td>IFEval</td>
95
+ <td>17.91</td>
96
+ <td><b>24.89</b></td>
97
+ </tr>
98
+ <tr>
99
+ <td>MUSR</td>
100
+ <td><b>4.87</b></td>
101
+ <td>4.6</td>
102
+ </tr>
103
+ <tr>
104
+ <td>GPQA</td>
105
+ <td>1.83</td>
106
+ <td>1.83</td>
107
+ </tr>
108
+ <tr>
109
+ <td>BBH</td>
110
+ <td><b>5.36</b></td>
111
+ <td>4.44</td>
112
+ </tr>
113
+ <tr>
114
+ <td>MMLU-PRO</td>
115
+ <td><b>2.78</b></td>
116
+ <td>1.36</td>
117
+ </tr>
118
+ <tr>
119
+ <td>MATH</td>
120
+ <td>0.26</td>
121
+ <td><b>0.48</b></td>
122
+ </tr>
123
+ <tr>
124
+ <td>Average</td>
125
+ <td>5.5</td>
126
+ <td><b>6.27</b></td>
127
+ </tr>
128
+ </tbody>
129
+ </table>
130
 
131
  # Citation
132