Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ Zamba2-1.2B is a hybrid model composed of state-space and transformer blocks. It
|
|
8 |
|
9 |
1.) Mamba1 blocks have been replaced with Mamba2 blocks.
|
10 |
|
11 |
-
2.) We apply a LoRA projector to each shared MLP and
|
12 |
|
13 |
3.) We utilize rotary position embeddings in the shared attention layer.
|
14 |
|
|
|
8 |
|
9 |
1.) Mamba1 blocks have been replaced with Mamba2 blocks.
|
10 |
|
11 |
+
2.) We apply a LoRA projector to each shared MLP and attention block, which allows the network to specialize at each invocation of the shared transformer layer across depth. LoRA enables us to add depth-specialization for only a minimal increase in total parameter count.
|
12 |
|
13 |
3.) We utilize rotary position embeddings in the shared attention layer.
|
14 |
|