TheBloke
/

Yi-34B-200K-DARE-megamerge-v8-GGUF

Model card Files Files and versions Community

TheBloke commited on Jan 15

Commit

96c248e

•

1 Parent(s): cb5eec6

Update README.md

Files changed (1) hide show

README.md +14 -3

README.md CHANGED Viewed

@@ -53,12 +53,23 @@ This repo contains GGUF format model files for [brucethemoose's Yi 34B 200K DARE
 <!-- description end -->
 <!-- README_GGUF.md-about-gguf start -->
-### About GGUF
-GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp.
-Here is an incomplete list of clients and libraries that are known to support GGUF:
 * [llama.cpp](https://github.com/ggerganov/llama.cpp). The source project for GGUF. Offers a CLI and a server option.
 * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration.
 * [KoboldCpp](https://github.com/LostRuins/koboldcpp), a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling.

 <!-- description end -->
 <!-- README_GGUF.md-about-gguf start -->
+### New GGUF formats
+### New GGUF formats
+The GGUF files in this repo were made using new k-quant methods, added Jan 2024.
+They will only be compatible with llama.cpp from Jan 4th onwards. Other clients may not have been updated for support yet.
+The new GGUF k-quant method enables use of an "importance matrix", which is similar in concept to the calibration datasets used by GPTQ, AWQ and EXL2.  This improves GGUF quantization quality.
+The dataset used for generating the importance matrix for these GGUFs was: VMware open-instruct (5K lines).
+Use of the importance matrix enables providing new quant formats: IQ2_XXS, IQ2_XS and Q2_K_S.
+Note: adding support for this new GGUF quant method is still a work-in-progress for me. Other GGUF repos I'm creating won't necessarily have this, at least for the next couple of days.
+### Clients with GGUF support (not tested with this GGUF quant format specifically, yet)
 * [llama.cpp](https://github.com/ggerganov/llama.cpp). The source project for GGUF. Offers a CLI and a server option.
 * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration.
 * [KoboldCpp](https://github.com/LostRuins/koboldcpp), a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling.