|
--- |
|
license: gemma |
|
base_model: |
|
- ifable/gemma-2-Ifable-9B |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
## Llama.cpp imatrix quants of gemma-2-Ifable-9B |
|
|
|
Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b3804">b3804</a> for quantization. |
|
|
|
Original model: https://huggingface.co/ifable/gemma-2-Ifable-9B |
|
|
|
All quants were made using the imatrix option (except BF16, that's the original model). The imatrix was generated with the dataset from [here](https://gist.github.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c), using the BF16 GGUF with a context size of 8192 tokens (default is 512 but higher/same as model context size should improve quality) and 13 chunks. |
|
|
|
How to make your own quants: |
|
|
|
https://github.com/ggerganov/llama.cpp/tree/master/examples/imatrix |
|
|
|
https://github.com/ggerganov/llama.cpp/tree/master/examples/quantize |