Update README.md
Browse files
README.md
CHANGED
@@ -60,7 +60,7 @@ print(tokenizer.decode(outputs[0]))
|
|
60 |
|
61 |
## Model Details
|
62 |
|
63 |
-
Zamba2-1.2B utilizes and extends our original Zamba hybrid SSM-attention architecture. The core Zamba architecture consists of a backbone of Mamba layers interleaved with one or more shared attention layers
|
64 |
|
65 |
<center>
|
66 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/Vay6htbnBcySR3Z6NEgwj.png" width="300" alt="Zamba architecture">
|
|
|
60 |
|
61 |
## Model Details
|
62 |
|
63 |
+
Zamba2-1.2B utilizes and extends our original Zamba hybrid SSM-attention architecture. The core Zamba architecture consists of a backbone of Mamba layers interleaved with one or more shared attention layers. This attention has shared weights to minimize the parameter cost of the model. We find that concatenating the original model embeddings to the input to this attention block improves performance, likely due to better maintenance of information across depth. The Zamba2 architecture also applies LoRA projection matrices to the shared transformer blocks to gain some additional expressivity in each block and allow each shared block to specialize slightly to its own unique position while keeping the additional parameter overhead small.
|
64 |
|
65 |
<center>
|
66 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/Vay6htbnBcySR3Z6NEgwj.png" width="300" alt="Zamba architecture">
|