hugging-quants
/

gemma-2-9b-it-AWQ-INT4

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

alvarobartt HF staff commited on Oct 15

Commit

c7ccd67

•

1 Parent(s): 57b35cf

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -13,6 +13,9 @@ tags:
 > [!IMPORTANT]
 > This repository is a community-driven quantized version of the original model [`google/gemma-2-9b-it`](https://huggingface.co/google/gemma-2-9b-it) which is the BF16 half-precision official version released by Google.
 ## Model Information
 Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights for both pre-trained variants and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.

 > [!IMPORTANT]
 > This repository is a community-driven quantized version of the original model [`google/gemma-2-9b-it`](https://huggingface.co/google/gemma-2-9b-it) which is the BF16 half-precision official version released by Google.
+> [!WARNING]
+> This model has been quantized using `transformers` 4.45.0, meaning that the tokenizer available in this repository won't be compatible with lower versions. Same applies for e.g. Text Generation Inference (TGI) that only installs `transformers` 4.45.0 or higher starting in v2.3.1.
 ## Model Information
 Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights for both pre-trained variants and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.