Hampetiudo
commited on
Commit
•
8750c44
1
Parent(s):
7138594
Update README.md
Browse files
README.md
CHANGED
@@ -11,4 +11,10 @@ Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a
|
|
11 |
|
12 |
Original model: https://huggingface.co/ifable/gemma-2-Ifable-9B
|
13 |
|
14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
Original model: https://huggingface.co/ifable/gemma-2-Ifable-9B
|
13 |
|
14 |
+
All quants were made using the imatrix option (except BF16, that's the original model). The imatrix was generated with the dataset from [here](https://gist.github.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c), using the BF16 GGUF with a context size of 8192 tokens (default is 512 but higher/same as model context size should improve quality) and 13 chunks.
|
15 |
+
|
16 |
+
How to make your own quants:
|
17 |
+
|
18 |
+
https://github.com/ggerganov/llama.cpp/tree/master/examples/imatrix
|
19 |
+
|
20 |
+
https://github.com/ggerganov/llama.cpp/tree/master/examples/quantize
|