Update README.md
Browse files
README.md
CHANGED
@@ -53,12 +53,23 @@ This repo contains GGUF format model files for [brucethemoose's Yi 34B 200K DARE
|
|
53 |
|
54 |
<!-- description end -->
|
55 |
<!-- README_GGUF.md-about-gguf start -->
|
56 |
-
###
|
57 |
|
58 |
-
|
59 |
|
60 |
-
|
61 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
62 |
* [llama.cpp](https://github.com/ggerganov/llama.cpp). The source project for GGUF. Offers a CLI and a server option.
|
63 |
* [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration.
|
64 |
* [KoboldCpp](https://github.com/LostRuins/koboldcpp), a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling.
|
|
|
53 |
|
54 |
<!-- description end -->
|
55 |
<!-- README_GGUF.md-about-gguf start -->
|
56 |
+
### New GGUF formats
|
57 |
|
58 |
+
### New GGUF formats
|
59 |
|
60 |
+
The GGUF files in this repo were made using new k-quant methods, added Jan 2024.
|
61 |
|
62 |
+
They will only be compatible with llama.cpp from Jan 4th onwards. Other clients may not have been updated for support yet.
|
63 |
+
|
64 |
+
The new GGUF k-quant method enables use of an "importance matrix", which is similar in concept to the calibration datasets used by GPTQ, AWQ and EXL2. This improves GGUF quantization quality.
|
65 |
+
|
66 |
+
The dataset used for generating the importance matrix for these GGUFs was: VMware open-instruct (5K lines).
|
67 |
+
|
68 |
+
Use of the importance matrix enables providing new quant formats: IQ2_XXS, IQ2_XS and Q2_K_S.
|
69 |
+
|
70 |
+
Note: adding support for this new GGUF quant method is still a work-in-progress for me. Other GGUF repos I'm creating won't necessarily have this, at least for the next couple of days.
|
71 |
+
|
72 |
+
### Clients with GGUF support (not tested with this GGUF quant format specifically, yet)
|
73 |
* [llama.cpp](https://github.com/ggerganov/llama.cpp). The source project for GGUF. Offers a CLI and a server option.
|
74 |
* [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration.
|
75 |
* [KoboldCpp](https://github.com/LostRuins/koboldcpp), a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling.
|