---
license: gemma
language:
- en
tags:
- conversational
quantized_by: qnixsynapse
---
## Llamacpp Quantizations of official gguf of gemma-2-9b-it from kaggle repo
Using llama.cpp PR 8156 for quantization.
Original model: https://huggingface.co/google/gemma-2-9b-it
## Downloading using huggingface-cli
First, make sure you have hugginface-cli installed:
```
pip install -U "huggingface_hub[cli]"
```
Then, you can target the specific file you want:
```
huggingface-cli download qnixsynapse/Gemma-V2-9B-Instruct-GGUF --include "" --local-dir ./
```
or you can download directly.
## Prompt format
The prompt format is same as Gemma v1 however not included with gguf file. This can be edited with gguf script to add a new key `chat_template` later.
```
user
{prompt}
model
```
The model should stop either at `` or ``. If it doesn't then stop tokens needs to be added to the gguf metadata.
## Quants
Currently only two quants are available:
| quant | size |
|-------|-------|
| Q4_K_S| 5.5GB|
|Q3_K_M | 4.8GB|
If Q4_K_S is causing OOM when offloading all the layers to the GPU, consider decreasing batch size or use Q3_K_M.
Minimum VRAM needed: 8GB