Hampetiudo
/

gemma-2-Ifable-9B-i1-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

Hampetiudo commited on Sep 24

Commit

8750c44

•

1 Parent(s): 7138594

Update README.md

Files changed (1) hide show

README.md +7 -1

README.md CHANGED Viewed

@@ -11,4 +11,10 @@ Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a
 Original model: https://huggingface.co/ifable/gemma-2-Ifable-9B
-Both quants were made using the imatrix option. The imatrix was generated with the dataset from [here](https://gist.github.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c), using the BF16 GGUF with a context size of 8192 tokens (default is 512 but higher/same as model context size should improve quality) and 13 chunks.

 Original model: https://huggingface.co/ifable/gemma-2-Ifable-9B
+All quants were made using the imatrix option (except BF16, that's the original model). The imatrix was generated with the dataset from [here](https://gist.github.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c), using the BF16 GGUF with a context size of 8192 tokens (default is 512 but higher/same as model context size should improve quality) and 13 chunks.
+How to make your own quants:
+https://github.com/ggerganov/llama.cpp/tree/master/examples/imatrix
+https://github.com/ggerganov/llama.cpp/tree/master/examples/quantize