Update README.md
Browse files
README.md
CHANGED
@@ -33,10 +33,13 @@ python3 ./path-to-llama.cpp/gguf-py/scripts/gguf-set-metadata.py $file tokenizer
|
|
33 |
|
34 |
|
35 |
Quantized with [llama.cpp](https://github.com/ggerganov/llama.cpp) commit [0d56246f4b9764158525d894b96606f6163c53a8](https://github.com/ggerganov/llama.cpp/commit/0d56246f4b9764158525d894b96606f6163c53a8) (master from 2024-04-18)
|
36 |
-
|
37 |
-
|
38 |
|
39 |
-
|
|
|
|
|
|
|
40 |
|
41 |
```
|
42 |
./imatrix -c 512 -m $model_name-f16.gguf -f $llama_cpp_path/groups_merged.txt -o $out_path/imat-f16-gmerged.dat
|
|
|
33 |
|
34 |
|
35 |
Quantized with [llama.cpp](https://github.com/ggerganov/llama.cpp) commit [0d56246f4b9764158525d894b96606f6163c53a8](https://github.com/ggerganov/llama.cpp/commit/0d56246f4b9764158525d894b96606f6163c53a8) (master from 2024-04-18)
|
36 |
+
|
37 |
+
I cherry-picked tokenizer fixes from [this](https://github.com/ggerganov/llama.cpp/pull/6745) branch to get it to work.
|
38 |
|
39 |
+
The quants use an importance matrix to improve quantization loss.
|
40 |
+
|
41 |
+
Using this command to generate the importance matrix from the f16.gguf with [this](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)
|
42 |
+
dataset.
|
43 |
|
44 |
```
|
45 |
./imatrix -c 512 -m $model_name-f16.gguf -f $llama_cpp_path/groups_merged.txt -o $out_path/imat-f16-gmerged.dat
|