TheBloke commited on
Commit
96c248e
1 Parent(s): cb5eec6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -3
README.md CHANGED
@@ -53,12 +53,23 @@ This repo contains GGUF format model files for [brucethemoose's Yi 34B 200K DARE
53
 
54
  <!-- description end -->
55
  <!-- README_GGUF.md-about-gguf start -->
56
- ### About GGUF
57
 
58
- GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp.
59
 
60
- Here is an incomplete list of clients and libraries that are known to support GGUF:
61
 
 
 
 
 
 
 
 
 
 
 
 
62
  * [llama.cpp](https://github.com/ggerganov/llama.cpp). The source project for GGUF. Offers a CLI and a server option.
63
  * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration.
64
  * [KoboldCpp](https://github.com/LostRuins/koboldcpp), a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling.
 
53
 
54
  <!-- description end -->
55
  <!-- README_GGUF.md-about-gguf start -->
56
+ ### New GGUF formats
57
 
58
+ ### New GGUF formats
59
 
60
+ The GGUF files in this repo were made using new k-quant methods, added Jan 2024.
61
 
62
+ They will only be compatible with llama.cpp from Jan 4th onwards. Other clients may not have been updated for support yet.
63
+
64
+ The new GGUF k-quant method enables use of an "importance matrix", which is similar in concept to the calibration datasets used by GPTQ, AWQ and EXL2. This improves GGUF quantization quality.
65
+
66
+ The dataset used for generating the importance matrix for these GGUFs was: VMware open-instruct (5K lines).
67
+
68
+ Use of the importance matrix enables providing new quant formats: IQ2_XXS, IQ2_XS and Q2_K_S.
69
+
70
+ Note: adding support for this new GGUF quant method is still a work-in-progress for me. Other GGUF repos I'm creating won't necessarily have this, at least for the next couple of days.
71
+
72
+ ### Clients with GGUF support (not tested with this GGUF quant format specifically, yet)
73
  * [llama.cpp](https://github.com/ggerganov/llama.cpp). The source project for GGUF. Offers a CLI and a server option.
74
  * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration.
75
  * [KoboldCpp](https://github.com/LostRuins/koboldcpp), a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling.