Zyphra
/

Zamba2-1.2B

transformers_zamba2

Model card Files Files and versions Community

pglo commited on Aug 26

Commit

fb082e5

•

1 Parent(s): c303f76

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -8,7 +8,7 @@ Zamba2-1.2B is a hybrid model composed of state-space and transformer blocks. It
 1.) Mamba1 blocks have been replaced with Mamba2 blocks.
-2.) We apply a LoRA projector to each shared MLP and Attention block, which allows the network to specialize at each invocation of the shared transformer layer across depth. LoRA enables us to add depth-specialization for only a minimal increase in total parameter count.
 3.) We utilize rotary position embeddings in the shared attention layer.

 1.) Mamba1 blocks have been replaced with Mamba2 blocks.
+2.) We apply a LoRA projector to each shared MLP and attention block, which allows the network to specialize at each invocation of the shared transformer layer across depth. LoRA enables us to add depth-specialization for only a minimal increase in total parameter count.
 3.) We utilize rotary position embeddings in the shared attention layer.