munish0838
commited on
Commit
•
d5bac53
1
Parent(s):
b0e6b75
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -339,7 +339,7 @@ Granite-3.0-1B-A400M-Instruct is based on a decoder-only sparse Mixture of Exper
|
|
339 |
| Initialization std | 0.1 | 0.1 | **0.1** | 0.1 |
|
340 |
| Sequence Length | 4096 | 4096 | **4096** | 4096 |
|
341 |
| Position Embedding | RoPE | RoPE | **RoPE** | RoPE |
|
342 |
-
| #
|
343 |
| # Active Parameters | 2.5B | 8.1B | **400M** | 800M |
|
344 |
| # Training tokens | 12T | 12T | **10T** | 10T |
|
345 |
|
|
|
339 |
| Initialization std | 0.1 | 0.1 | **0.1** | 0.1 |
|
340 |
| Sequence Length | 4096 | 4096 | **4096** | 4096 |
|
341 |
| Position Embedding | RoPE | RoPE | **RoPE** | RoPE |
|
342 |
+
| # Parameters | 2.5B | 8.1B | **1.3B** | 3.3B |
|
343 |
| # Active Parameters | 2.5B | 8.1B | **400M** | 800M |
|
344 |
| # Training tokens | 12T | 12T | **10T** | 10T |
|
345 |
|