Update README.md
Browse files
README.md
CHANGED
@@ -74,7 +74,7 @@ Granite-3.1-1B-A400M-Base is based on a decoder-only sparse Mixture of Experts (
|
|
74 |
| Number of experts | β | β | **32** | 40 |
|
75 |
| MoE TopK | β | β | **8** | 8 |
|
76 |
| Initialization std | 0.1 | 0.1 | **0.1** | 0.1 |
|
77 |
-
| Sequence length | 128K |
|
78 |
| Position embedding | RoPE | RoPE | **RoPE** | RoPE |
|
79 |
| # Parameters | 2.5B | 8.1B | **1.3B** | 3.3B |
|
80 |
| # Active parameters | 2.5B | 8.1B | **400M** | 800M |
|
|
|
74 |
| Number of experts | β | β | **32** | 40 |
|
75 |
| MoE TopK | β | β | **8** | 8 |
|
76 |
| Initialization std | 0.1 | 0.1 | **0.1** | 0.1 |
|
77 |
+
| Sequence length | 128K | 128K | **128K** | 128K |
|
78 |
| Position embedding | RoPE | RoPE | **RoPE** | RoPE |
|
79 |
| # Parameters | 2.5B | 8.1B | **1.3B** | 3.3B |
|
80 |
| # Active parameters | 2.5B | 8.1B | **400M** | 800M |
|