Files changed (1) hide show
  1. README.md +84 -70
README.md CHANGED
@@ -4,31 +4,26 @@ language:
4
  tags:
5
  - falcon3
6
  - falcon3_mamba
7
- - falcon_mamba
8
  base_model:
9
  - tiiuae/Falcon3-Mamba-7B-Base
10
- license: other
11
- license_name: falcon-llm-license
12
- license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
13
- library_name: transformers
14
  ---
15
 
16
  # Falcon3-Mamba-7B-Instruct
17
 
18
  **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
19
 
20
- This repository contains the **Falcon3-Mamba-7B-Instruct**. It achieves, compared to similar SSM-based models of the same size, state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
21
- Falcon3-Mamba-7B-Instruct supports a context length up to 32K and was mainly trained on english corpus.
22
 
23
  ## Model Details
24
- - Architecture (same as [Falcon-Mamba-7b](https://huggingface.co/tiiuae/falcon-mamba-7b))
25
  - Mamba1 based causal decoder only architecture trained on a causal language modeling task (i.e., predict the next token).
26
  - 64 decoder blocks
27
  - width: 4096
28
  - state_size: 16
29
  - 32k context length
30
  - 65k vocab size
31
- - Continue Pretrained from [Falcon Mamba 7B](https://huggingface.co/tiiuae/falcon-mamba-7b), with another 1500 Gigatokens of data comprising of web, code, STEM and high quality data.
32
  - Postrained on 1.2 million samples of STEM, conversations, code, and safety.
33
  - Developed by [Technology Innovation Institute](https://www.tii.ae)
34
  - License: TII Falcon-LLM License 2.0
@@ -84,7 +79,7 @@ print(response)
84
  <br>
85
 
86
  # Benchmarks
87
- We report in the following table our internal pipeline benchmarks. For the benchmarks marked by star, we normalize the results with HuggingFace score normalization:
88
 
89
  <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
90
  <colgroup>
@@ -93,6 +88,7 @@ We report in the following table our internal pipeline benchmarks. For the bench
93
  <col style="width: 7%;">
94
  <col style="width: 7%;">
95
  <col style="width: 7%;">
 
96
  <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
97
  </colgroup>
98
  <thead>
@@ -101,6 +97,7 @@ We report in the following table our internal pipeline benchmarks. For the bench
101
  <th>Benchmark</th>
102
  <th>Zamba2-7B-instruct</th>
103
  <th>Jamba-1.5-Mini</th>
 
104
  <th>Llama-3.1-8B-Instruct</th>
105
  <th>Falcon3-Mamba-7B-Instruct</th>
106
  </tr>
@@ -109,105 +106,122 @@ We report in the following table our internal pipeline benchmarks. For the bench
109
  <tr>
110
  <td rowspan="3">General</td>
111
  <td>MMLU (5-shot)</td>
112
- <td>30.6</td>
113
- <td>68.7</td>
114
- <td>55.9</td>
115
- <td>65.3</td>
 
116
  </tr>
117
  <tr>
118
- <td>MMLU-PRO (5-shot)*</td>
119
- <td>32.4</td>
120
- <td>31.6</td>
121
- <td>21.8</td>
122
- <td>26.3</td>
 
123
  </tr>
124
  <tr>
125
  <td>IFEval</td>
126
- <td>69.9</td>
127
- <td>65.7</td>
128
- <td>78.8</td>
129
- <td>71.7</td>
 
130
  </tr>
131
  <tr>
132
  <td rowspan="2">Math</td>
133
  <td>GSM8K (5-shot)</td>
134
- <td>0</td>
135
- <td>74.9</td>
136
- <td>19.2</td>
137
- <td>65.2</td>
 
138
  </tr>
139
  <tr>
140
- <td>MATH Lvl-5 (4-shot)</td>
141
- <td>13.6</td>
142
- <td>6.9</td>
143
- <td>10.4</td>
144
- <td>27.3</td>
 
145
  </tr>
146
  <tr>
147
  <td rowspan="4">Reasoning</td>
148
  <td>Arc Challenge (25-shot)</td>
149
- <td>54</td>
150
- <td>54.3</td>
151
- <td>46.6</td>
152
- <td>53.7</td>
 
153
  </tr>
154
  <tr>
155
- <td>GPQA (0-shot)*</td>
156
- <td>10.3</td>
157
- <td>11.1</td>
158
- <td>6.2</td>
159
- <td>7.2</td>
 
160
  </tr>
161
  <tr>
162
- <td>MUSR (0-shot)*</td>
163
- <td>8.2</td>
164
- <td>12.2</td>
165
- <td>38.6</td>
166
- <td>8.3</td>
 
167
  </tr>
168
  <tr>
169
- <td>BBH (3-shot)*</td>
170
- <td>33.3</td>
171
- <td>35.3</td>
172
- <td>43.7</td>
173
- <td>25.2</td>
 
174
  </tr>
175
  <tr>
176
  <td rowspan="4">CommonSense Understanding</td>
177
  <td>PIQA (0-shot)</td>
178
- <td>75.6</td>
179
- <td>82.3</td>
180
- <td>78.9</td>
181
- <td>80.9</td>
 
182
  </tr>
183
  <tr>
184
  <td>SciQ (0-shot)</td>
185
- <td>29.2</td>
186
- <td>94.9</td>
187
- <td>80.2</td>
188
- <td>93.6</td>
 
 
 
 
 
 
 
 
 
189
  </tr>
190
  <tr>
191
  <td>OpenbookQA (0-shot)</td>
192
- <td>45.6</td>
193
- <td>45.8</td>
194
- <td>46.2</td>
195
- <td>47.2</td>
 
196
  </tr>
197
  </tbody>
198
  </table>
199
 
200
- ## Useful links
201
- - View our [release blogpost](https://huggingface.co/blog/falcon3).
202
- - Feel free to join [our discord server](https://discord.gg/fwXpMyGc) if you have any questions or to interact with our researchers and developers.
203
 
204
- ## Citation
205
- If the Falcon3 family of models were helpful to your work, feel free to give us a cite.
206
 
207
  ```
208
  @misc{Falcon3,
209
- title = {The Falcon 3 Family of Open Models},
210
- author = {Falcon-LLM Team},
211
  month = {December},
212
  year = {2024}
213
  }
 
4
  tags:
5
  - falcon3
6
  - falcon3_mamba
 
7
  base_model:
8
  - tiiuae/Falcon3-Mamba-7B-Base
 
 
 
 
9
  ---
10
 
11
  # Falcon3-Mamba-7B-Instruct
12
 
13
  **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
14
 
15
+ This repository contains the **Falcon3-Mamba-7B-Instruct**. It achieves ,compared to similar SSM-based models of the same size, state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
16
+ Falcon3-Mamba-7B-Instruct supports a context length up to 32K and 1 language (english).
17
 
18
  ## Model Details
19
+ - Architecture(same as Falcon-Mamba-7b)
20
  - Mamba1 based causal decoder only architecture trained on a causal language modeling task (i.e., predict the next token).
21
  - 64 decoder blocks
22
  - width: 4096
23
  - state_size: 16
24
  - 32k context length
25
  - 65k vocab size
26
+ - Pretrained on 7 Teratokens of datasets comprising of web, code, STEM and high quality data using 2048 H100 GPU chips
27
  - Postrained on 1.2 million samples of STEM, conversations, code, and safety.
28
  - Developed by [Technology Innovation Institute](https://www.tii.ae)
29
  - License: TII Falcon-LLM License 2.0
 
79
  <br>
80
 
81
  # Benchmarks
82
+ We report in the following table our internal pipeline benchmarks:
83
 
84
  <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
85
  <colgroup>
 
88
  <col style="width: 7%;">
89
  <col style="width: 7%;">
90
  <col style="width: 7%;">
91
+ <col style="width: 7%;">
92
  <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
93
  </colgroup>
94
  <thead>
 
97
  <th>Benchmark</th>
98
  <th>Zamba2-7B-instruct</th>
99
  <th>Jamba-1.5-Mini</th>
100
+ <th>Qwen2-7B-Instruct</th>
101
  <th>Llama-3.1-8B-Instruct</th>
102
  <th>Falcon3-Mamba-7B-Instruct</th>
103
  </tr>
 
106
  <tr>
107
  <td rowspan="3">General</td>
108
  <td>MMLU (5-shot)</td>
109
+ <td>-</td>
110
+ <td>68.7%</td>
111
+ <td>-</td>
112
+ <td>55.9%</td>
113
+ <td>-</td>
114
  </tr>
115
  <tr>
116
+ <td>MMLU-PRO (5-shot)</td>
117
+ <td>32.4%</td>
118
+ <td>31.6%</td>
119
+ <td>31.6%</td>
120
+ <td>21.8%</td>
121
+ <td>26.3%</td>
122
  </tr>
123
  <tr>
124
  <td>IFEval</td>
125
+ <td>69.9%</td>
126
+ <td>65.7%</td>
127
+ <td>56.8%</td>
128
+ <td>78.8%</td>
129
+ <td>71.7%</td>
130
  </tr>
131
  <tr>
132
  <td rowspan="2">Math</td>
133
  <td>GSM8K (5-shot)</td>
134
+ <td>-</td>
135
+ <td>74.9%</td>
136
+ <td>-</td>
137
+ <td>19.2%</td>
138
+ <td>-</td>
139
  </tr>
140
  <tr>
141
+ <td>MATH(4-shot)</td>
142
+ <td>-</td>
143
+ <td>6.9%</td>
144
+ <td>9.44%</td>
145
+ <td>10.4%</td>
146
+ <td>27.3%</td>
147
  </tr>
148
  <tr>
149
  <td rowspan="4">Reasoning</td>
150
  <td>Arc Challenge (25-shot)</td>
151
+ <td>-</td>
152
+ <td>54.3%</td>
153
+ <td>-</td>
154
+ <td>46.6%</td>
155
+ <td>-</td>
156
  </tr>
157
  <tr>
158
+ <td>GPQA (0-shot)</td>
159
+ <td>10.3%</td>
160
+ <td>11.1%</td>
161
+ <td>6.4%</td>
162
+ <td>33.6%</td>
163
+ <td>7.2%</td>
164
  </tr>
165
  <tr>
166
+ <td>MUSR (0-shot)</td>
167
+ <td>8.2%</td>
168
+ <td>12.2%</td>
169
+ <td>7.4%</td>
170
+ <td>38.6%</td>
171
+ <td>8.3%</td>
172
  </tr>
173
  <tr>
174
+ <td>BBH (3-shot)</td>
175
+ <td>33.3%</td>
176
+ <td>35.3%</td>
177
+ <td>37.8%</td>
178
+ <td>43.7%</td>
179
+ <td>25.2%</td>
180
  </tr>
181
  <tr>
182
  <td rowspan="4">CommonSense Understanding</td>
183
  <td>PIQA (0-shot)</td>
184
+ <td>-</td>
185
+ <td>82.3%</td>
186
+ <td>-</td>
187
+ <td>78.9%</td>
188
+ <td>-</td>
189
  </tr>
190
  <tr>
191
  <td>SciQ (0-shot)</td>
192
+ <td>-</td>
193
+ <td>94.9%</td>
194
+ <td>-</td>
195
+ <td>80.2%</td>
196
+ <td>-</td>
197
+ </tr>
198
+ <tr>
199
+ <td>Winogrande (0-shot)</td>
200
+ <td>-</td>
201
+ <td>64.5%</td>
202
+ <td>-</td>
203
+ <td>-</td>
204
+ <td>-</td>
205
  </tr>
206
  <tr>
207
  <td>OpenbookQA (0-shot)</td>
208
+ <td>-</td>
209
+ <td>34.6%</td>
210
+ <td>-</td>
211
+ <td>46.2%</td>
212
+ <td>-</td>
213
  </tr>
214
  </tbody>
215
  </table>
216
 
 
 
 
217
 
218
+ # Citation
219
+ If Falcon3 family were helpful to your work, feel free to give us a cite.
220
 
221
  ```
222
  @misc{Falcon3,
223
+ title = {The Falcon 3 family of Open Models},
224
+ author = {TII Team},
225
  month = {December},
226
  year = {2024}
227
  }