puneeshkhanna
commited on
Commit
•
cee7ab0
1
Parent(s):
ab2a252
Update README.md
Browse files
README.md
CHANGED
@@ -19,11 +19,11 @@ Falcon3-7B-Base supports 4 languages (english, french, spanish, portuguese) and
|
|
19 |
|
20 |
## Model Details
|
21 |
- Architecture
|
22 |
-
-
|
23 |
- 28 decoder blocks
|
24 |
-
-
|
25 |
-
-
|
26 |
-
-
|
27 |
- 32k context length
|
28 |
- 131k vocab size
|
29 |
- Pretrained on 14 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
|
|
|
19 |
|
20 |
## Model Details
|
21 |
- Architecture
|
22 |
+
- Transformer based causal decoder only architecture
|
23 |
- 28 decoder blocks
|
24 |
+
- Grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads
|
25 |
+
- Wider head dimension: 256
|
26 |
+
- High RoPE value to support long context understanding: 1000042
|
27 |
- 32k context length
|
28 |
- 131k vocab size
|
29 |
- Pretrained on 14 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
|