Upload tokenizer_config.json

#1
by fedyanin - opened
README.md CHANGED
@@ -1,226 +1,3 @@
1
- ---
2
- language:
3
- - en
4
- - fr
5
- - es
6
- - pt
7
- tags:
8
- - falcon3
9
- license: other
10
- license_name: falcon-llm-license
11
- license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
12
- library_name: transformers
13
- ---
14
-
15
- <div align="center">
16
- <img src="https://huggingface.co/datasets/tiiuae/documentation-images/resolve/main/general/falco3-logo.png" alt="drawing" width="500"/>
17
- </div>
18
-
19
- # Falcon3-7B-Base
20
-
21
- **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
22
-
23
- This repository contains the **Falcon3-7B-Base**. It achieves state of art results (at the time of release) on reasoning, language understanding, instruction following, code and mathematics tasks.
24
- Falcon3-7B-Base supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
25
-
26
- ⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
27
-
28
- ## Model Details
29
- - Architecture
30
- - transformer based causal decoder only architecture
31
- - 28 decoder blocks
32
- - grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads
33
- - wider head dimension: 256
34
- - high RoPE value to support long context understanding: 1000042
35
- - 32k context length
36
- - 131k vocab size
37
- - Pretrained on 14 Teratokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 1024 H100 GPU chips
38
- - Supports EN, FR, ES, PT
39
- - Developed by [Technology Innovation Institute](https://www.tii.ae)
40
- - License: TII Falcon-LLM License 2.0
41
- - Model Release Date: December 2024
42
-
43
-
44
- ## Getting started
45
-
46
- <details>
47
- <summary> Click to expand </summary>
48
-
49
- ```python
50
- import torch
51
- from transformers import pipeline
52
-
53
- pipe = pipeline(
54
- "text-generation",
55
- model="tiiuae/Falcon3-7B-Base",
56
- torch_dtype=torch.bfloat16,
57
- device_map="auto"
58
- )
59
- response = pipe("Question: How many hours in one day? Answer: ")
60
- print(response[0]['generated_text'])
61
- ```
62
-
63
- </details>
64
-
65
- <br>
66
-
67
- ## Benchmarks
68
- We report in the following table our internal pipeline benchmarks.
69
- - We use [lm-evaluation harness](https://github.com/EleutherAI/lm-evaluation-harness).
70
- - We report **raw scores**.
71
- - We use same batch-size across all models.
72
-
73
-
74
-
75
- <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
76
- <colgroup>
77
- <col style="width: 10%;">
78
- <col style="width: 10%;">
79
- <col style="width: 7%;">
80
- <col style="width: 7%;">
81
- <col style="width: 7%;">
82
- <col style="width: 7%;">
83
- <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
84
- </colgroup>
85
- <thead>
86
- <tr>
87
- <th>Category</th>
88
- <th>Benchmark</th>
89
- <th>Llama3.1-8B</th>
90
- <th>Qwen2-7B</th>
91
- <th>Qwen2.5-7B</th>
92
- <th>gemma-2-9b</th>
93
- <th>Falcon3-7B-Base</th>
94
- </tr>
95
- </thead>
96
- <tbody>
97
- <tr>
98
- <td rowspan="3">General</td>
99
- <td>MMLU (5-shot)</td>
100
- <td>65.2</td>
101
- <td>70.4</td>
102
- <td>74.2</td>
103
- <td>-</td>
104
- <td>67.5</td>
105
- </tr>
106
- <tr>
107
- <td>MMLU-PRO (5-shot)</td>
108
- <td>32.7</td>
109
- <td>42.1</td>
110
- <td>43.5</td>
111
- <td>-</td>
112
- <td>39.2</td>
113
- </tr>
114
- <tr>
115
- <td>IFEval</td>
116
- <td>12.0</td>
117
- <td>30.6</td>
118
- <td>33.9</td>
119
- <td>-</td>
120
- <td>34.3</td>
121
- </tr>
122
- <tr>
123
- <td rowspan="2">Math</td>
124
- <td>GSM8K (5-shot)</td>
125
- <td>49.4</td>
126
- <td>77.9</td>
127
- <td>82.9</td>
128
- <td>-</td>
129
- <td>76.2</td>
130
- </tr>
131
- <tr>
132
- <td>MATH(4-shot)</td>
133
- <td>4.1</td>
134
- <td>17.5</td>
135
- <td>15.5</td>
136
- <td>-</td>
137
- <td>18.0</td>
138
- </tr>
139
- <tr>
140
- <td rowspan="4">Reasoning</td>
141
- <td>Arc Challenge (25-shot)</td>
142
- <td>53.4</td>
143
- <td>57.4</td>
144
- <td>59.0</td>
145
- <td>-</td>
146
- <td>59.6</td>
147
- </tr>
148
- <tr>
149
- <td>GPQA (0-shot)</td>
150
- <td>31.0</td>
151
- <td>31.9</td>
152
- <td>33.0</td>
153
- <td>-</td>
154
- <td>35.5</td>
155
- </tr>
156
- <tr>
157
- <td>MUSR (0-shot)</td>
158
- <td>38.0</td>
159
- <td>44.1</td>
160
- <td>44.2</td>
161
- <td>-</td>
162
- <td>47.3</td>
163
- </tr>
164
- <tr>
165
- <td>BBH (3-shot)</td>
166
- <td>46.5</td>
167
- <td>53.3</td>
168
- <td>54.0</td>
169
- <td>-</td>
170
- <td>51.0</td>
171
- </tr>
172
- <tr>
173
- <td rowspan="4">CommonSense Understanding</td>
174
- <td>PIQA (0-shot)</td>
175
- <td>80.3</td>
176
- <td>79.8</td>
177
- <td>78.7</td>
178
- <td>-</td>
179
- <td>77.7</td>
180
- </tr>
181
- <tr>
182
- <td>SciQ (0-shot)</td>
183
- <td>96.3</td>
184
- <td>95.9</td>
185
- <td>96.6</td>
186
- <td>-</td>
187
- <td>95.3</td>
188
- </tr>
189
- <tr>
190
- <td>Winogrande (0-shot)</td>
191
- <td>74.0</td>
192
- <td>72.1</td>
193
- <td>72.9</td>
194
- <td>-</td>
195
- <td>71.0</td>
196
- </tr>
197
- <tr>
198
- <td>OpenbookQA (0-shot)</td>
199
- <td>33.4</td>
200
- <td>35.2</td>
201
- <td>33.6</td>
202
- <td>-</td>
203
- <td>31.4</td>
204
- </tr>
205
- </tbody>
206
- </table>
207
-
208
- ## Useful links
209
- - View our [release blogpost](https://huggingface.co/blog/falcon3).
210
- - Feel free to join [our discord server](https://discord.gg/fwXpMyGc) if you have any questions or to interact with our researchers and developers.
211
-
212
- ## Technical Report
213
-
214
- Coming soon....
215
-
216
- ## Citation
217
- If Falcon3 family were helpful to your work, feel free to give us a cite.
218
-
219
- ```
220
- @misc{Falcon3,
221
- title = {Falcon 3 family of Open Foundation Models},
222
- author = {TII Team},
223
- month = {December},
224
- year = {2024}
225
- }
226
- ```
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json CHANGED
@@ -1,9 +1,11 @@
1
  {
 
2
  "architectures": [
3
  "LlamaForCausalLM"
4
  ],
5
  "attention_bias": false,
6
  "attention_dropout": 0.0,
 
7
  "eos_token_id": 11,
8
  "head_dim": 256,
9
  "hidden_act": "silu",
@@ -22,7 +24,7 @@
22
  "rope_theta": 1000042,
23
  "tie_word_embeddings": false,
24
  "torch_dtype": "bfloat16",
25
- "transformers_version": "4.46.1",
26
  "use_cache": true,
27
  "vocab_size": 131072
28
  }
 
1
  {
2
+ "_name_or_path": "falcon3-7b-32k-best",
3
  "architectures": [
4
  "LlamaForCausalLM"
5
  ],
6
  "attention_bias": false,
7
  "attention_dropout": 0.0,
8
+ "bos_token_id": 11,
9
  "eos_token_id": 11,
10
  "head_dim": 256,
11
  "hidden_act": "silu",
 
24
  "rope_theta": 1000042,
25
  "tie_word_embeddings": false,
26
  "torch_dtype": "bfloat16",
27
+ "transformers_version": "4.46.2",
28
  "use_cache": true,
29
  "vocab_size": 131072
30
  }
generation_config.json CHANGED
@@ -2,5 +2,5 @@
2
  "_from_model_config": true,
3
  "bos_token_id": 11,
4
  "eos_token_id": 11,
5
- "transformers_version": "4.46.1"
6
  }
 
2
  "_from_model_config": true,
3
  "bos_token_id": 11,
4
  "eos_token_id": 11,
5
+ "transformers_version": "4.46.2"
6
  }
model-00001-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:53d9c6da709ba945fc753a055e3735dc96778ed7c21fc5e18fda7e46a2ebe558
3
  size 4938900432
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:614046fa84e0e1198b7e6724db1e480b936c5ac7b10e71a0a3b597b76c7ed4b2
3
  size 4938900432
model-00002-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f1a991d77660a3415a6cdb23b4cbda1d5f94860902eb178768d19323bb96380c
3
  size 4942085160
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2db97e5afc788c6debe8aa45c76d7ab324c06ff6ef0e93c82652d352f7b7429b
3
  size 4942085160
model-00003-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:64d3170c20cb059ff47b5c4ca8d4d9aa92877a07e2f4deb5d8c2aaf8179c1445
3
  size 4224838512
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:54b2927ab6d76174b83c59afbcf047b62c62e9413e7c394e64e3a930bb4753d3
3
  size 4224838512
model-00004-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ee9ce0968e247936874716ae370e08f2239f8eb3f73d21f8664658a52763f360
3
  size 805306496
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a7bde1914961b72b5ff4b6de14a30e016327ac13bd27a210b82d5ac4aab35ab4
3
  size 805306496