qiyang-zhao commited on
Commit
1eac05f
1 Parent(s): 10514fa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -2
README.md CHANGED
@@ -67,8 +67,59 @@ python run_inference.py -m models/Falcon3-7B-Base-1.58bit/ggml-model-i2_s.gguf -
67
  ```
68
 
69
  # Evaluation
70
-
71
- Coming soon ..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
 
73
  # Citation
74
 
 
67
  ```
68
 
69
  # Evaluation
70
+ We report in the following table our internal pipeline benchmarks:
71
+
72
+ <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
73
+ <colgroup>
74
+ <col style="width: 10%;">
75
+ <col style="width: 10%;">
76
+ <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
77
+ </colgroup>
78
+ <thead>
79
+ <tr>
80
+ <th>Benchmark</th>
81
+ <th>Llama3-8B-1.58-100B-tokens</th>
82
+ <th>Falcon3-7B-Base-1.58bit </th>
83
+ </tr>
84
+ </thead>
85
+ <tbody>
86
+ <tr>
87
+ <td>IFEval</td>
88
+ <td>17.91</td>
89
+ <td><b>25.43</b></td>
90
+ </tr>
91
+ <tr>
92
+ <td>MUSR</td>
93
+ <td>4.87</td>
94
+ <td><b>5.75</b></td>
95
+ </tr>
96
+ <tr>
97
+ <td>GPQA</td>
98
+ <td>1.83</td>
99
+ <td><b>2.32</b></td>
100
+ </tr>
101
+ <tr>
102
+ <td>BBH</td>
103
+ <td><b>5.36</b></td>
104
+ <td>3.91</td>
105
+ </tr>
106
+ <tr>
107
+ <td>MMLU-PRO</td>
108
+ <td><b>2.78</b></td>
109
+ <td>1.36</td>
110
+ </tr>
111
+ <tr>
112
+ <td>MATH</td>
113
+ <td>0.26</td>
114
+ <td><b>0.88</b></td>
115
+ </tr>
116
+ <tr>
117
+ <td>Average</td>
118
+ <td>5.5</td>
119
+ <td><b>6.61</b></td>
120
+ </tr>
121
+ </tbody>
122
+ </table>
123
 
124
  # Citation
125