robbiemu
/

salamandra-2b-instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

robbiemu commited on 24 days ago

Commit

13ee45c

•

1 Parent(s): ca8787e

updated readme

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -56,7 +56,7 @@ source repo: [BSC-LT/salamandra-2b-instruct](https://huggingface.co/BSC-LT/salam
 ### **Notes:**
 - **Recommended Quantizations:**
-  - **Q4_K_S:** Although it offers good size reduction with minimal PPL impact, it is superseded by more optimal choices like Q5_K_M and Q6_K.
   - **Q5_K_M:** Offers the best balance between low perplexity and reduced file size above Q4, making it ideal for most applications.
   - **Q6_K:** Delivers nearly lossless performance compared to bf16 with a reduced file size (2.4G vs. 4.2G). Ideal for scenarios requiring maximum accuracy with some size savings.
 - **Non-recommended Quantizations:**

 ### **Notes:**
 - **Recommended Quantizations:**
+  - **Q4_K_S:** Although it offers good size reduction with minimal PPL impact, it is superseded by more optimal choices like Q5_K_M and Q6_K, but it is the only model with minimal PPL impact below 2GB.
   - **Q5_K_M:** Offers the best balance between low perplexity and reduced file size above Q4, making it ideal for most applications.
   - **Q6_K:** Delivers nearly lossless performance compared to bf16 with a reduced file size (2.4G vs. 4.2G). Ideal for scenarios requiring maximum accuracy with some size savings.
 - **Non-recommended Quantizations:**