Zyphra
/

Zamba2-1.2B-instruct

Text Generation

Inference Endpoints

Model card Files Files and versions Community

pglo commited on Sep 19

Commit

dc811fc

•

1 Parent(s): f951bb1

Update README.md

Files changed (1) hide show

README.md +3 -9

README.md CHANGED Viewed

@@ -82,6 +82,9 @@ Zamba2-1.2B utilizes and extends our original Zamba hybrid SSM-attention archite
 Zamba2-1.2B achieves leading and state-of-the-art performance among models of <2B parameters and is competitive with some models of significantly greater size. Moreover, due to its unique hybrid SSM architecture, Zamba2-1.2B achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer based models.
 Zamba2-1.2B's high performance and small inference compute and memory footprint renders it an ideal generalist model for on-device applications.
 <center>
@@ -89,10 +92,6 @@ Zamba2-1.2B's high performance and small inference compute and memory footprint
 </center>
-<center>
-<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/JVZUvVMPIpIJy9RDyohMJ.png" width="800" alt="Zamba performance">
-</center>
 Time to First Token (TTFT)             |  Output Generation
 :-------------------------:|:-------------------------:
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/5lpWDLdtPPVAk8COJq7gZ.png)  |  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/V2tS6eCOGbpKybEoZmOB7.png)
@@ -103,8 +102,3 @@ And memory overhead
 <center>
 <img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/m0YUmAmiVnRg6l9m10CEt.png" width="400" alt="Zamba inference and memory cost">
 </center>
-## Notice
-Zamba2-1.2B is a pretrained base model and therefore does not have any moderation mechanism and may output toxic or otherwise harmful language. In addition, one should not expect good instruct or chat performance, as this model was not fine-tuned for instruction following or chat.

 Zamba2-1.2B achieves leading and state-of-the-art performance among models of <2B parameters and is competitive with some models of significantly greater size. Moreover, due to its unique hybrid SSM architecture, Zamba2-1.2B achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer based models.
+<img src="https://cdn-uploads.huggingface.co/production/uploads/64e40335c0edca443ef8af3e/t7et3jazHNvxKSkeorZuo.png" width="600"/>
 Zamba2-1.2B's high performance and small inference compute and memory footprint renders it an ideal generalist model for on-device applications.
 <center>
 </center>
 Time to First Token (TTFT)             |  Output Generation
 :-------------------------:|:-------------------------:
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/5lpWDLdtPPVAk8COJq7gZ.png)  |  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/V2tS6eCOGbpKybEoZmOB7.png)
 <center>
 <img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/m0YUmAmiVnRg6l9m10CEt.png" width="400" alt="Zamba inference and memory cost">
 </center>