ikawrakow
/

various-2bit-sota-gguf

Inference Endpoints

Model card Files Files and versions Community

ikawrakow commited on Jan 10

Commit

ffe00e8

•

1 Parent(s): 6b581ce

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -1,4 +1,7 @@
 ---
 license: apache-2.0
 ---
-Various models in GGUF format quantized with a new 2-bit approach. Intended for use with llama.cpp. Requires llama.cpp PR 4773.

 ---
 license: apache-2.0
 ---
+Various models in GGUF format quantized with a new 2-bit approach. Intended for use with llama.cpp. Requires llama.cpp PR 4773.
+Update: PR 4773 has been merged into `llama.cpp`, but I have added new models that require PR 4856.
+The new models are those that have around 2.3-2.4 bpw. They have a lower quantization error at the xpense of being ~10% larger.