qwp4w3hyb
/

Meta-Llama-3-8B-Instruct-iMat-GGUF

Text Generation

importance matrix

Inference Endpoints

Model card Files Files and versions Community

qwp4w3hyb commited on Apr 19, 2024

Commit

ba75806

·

verified ·

1 Parent(s): cb276aa

Update README.md

Files changed (1) hide show

README.md +6 -3

README.md CHANGED Viewed

@@ -33,10 +33,13 @@ python3 ./path-to-llama.cpp/gguf-py/scripts/gguf-set-metadata.py $file tokenizer
 Quantized with [llama.cpp](https://github.com/ggerganov/llama.cpp) commit [0d56246f4b9764158525d894b96606f6163c53a8](https://github.com/ggerganov/llama.cpp/commit/0d56246f4b9764158525d894b96606f6163c53a8) (master from 2024-04-18)
- with tokenizer fixes from [this](https://github.com/ggerganov/llama.cpp/pull/6745) branch cherry-picked
-Imatrix dataset was used from [here](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)
-Using this command to generate the importance matrix from the f16.gguf
 ```
 ./imatrix -c 512 -m $model_name-f16.gguf -f $llama_cpp_path/groups_merged.txt -o $out_path/imat-f16-gmerged.dat

 Quantized with [llama.cpp](https://github.com/ggerganov/llama.cpp) commit [0d56246f4b9764158525d894b96606f6163c53a8](https://github.com/ggerganov/llama.cpp/commit/0d56246f4b9764158525d894b96606f6163c53a8) (master from 2024-04-18)
+I cherry-picked tokenizer fixes from [this](https://github.com/ggerganov/llama.cpp/pull/6745) branch to get it to work.
+The quants use an importance matrix to improve quantization loss.
+Using this command to generate the importance matrix from the f16.gguf with [this](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)
+ dataset.
 ```
 ./imatrix -c 512 -m $model_name-f16.gguf -f $llama_cpp_path/groups_merged.txt -o $out_path/imat-f16-gmerged.dat