BramVanroy
/

fietje-2-gguf

Inference Endpoints

Model card Files Files and versions Community

BramVanroy commited on May 1

Commit

627a04d

•

1 Parent(s): 48fb0e5

Update README.md

Files changed (1) hide show

README.md +12 -11

README.md CHANGED Viewed

@@ -8,16 +8,17 @@ tags:
 This repository contains quantized versions of [BramVanroy/fietje-2b](https://huggingface.co/BramVanroy/fietje-2b):
-- `-f16` (5.6GB): best quality, but largest and slowest (recommended if you have the capacity, otherwise q8_0)
-- `-q8_0` (3.0GB): minimal quality loss, smaller
-- `-q5_k_m` (2.0GB): users have reported considerable quality loss in the chat `q5_k_m` version so you may want to avoid it
-Also available on ollama:
-```sh
-# defaults to f16
-ollama run bramvanroy/fietje-2b
-ollama run bramvanroy/fietje-2b:f16
-ollama run bramvanroy/fietje-2b:q8_0
-ollama run bramvanroy/fietje-2b:q5_k_m
-```

 This repository contains quantized versions of [BramVanroy/fietje-2b](https://huggingface.co/BramVanroy/fietje-2b):
+Available quantization types and expected performance differences compared to base `f16`, higher perplexity=worse (from llama.cpp):
+```
+Q3_K_M  :  3.07G, +0.2496 ppl @ LLaMA-v1-7B
+Q4_K_M  :  3.80G, +0.0532 ppl @ LLaMA-v1-7B
+Q5_K_M  :  4.45G, +0.0122 ppl @ LLaMA-v1-7B
+Q6_K    :  5.15G, +0.0008 ppl @ LLaMA-v1-7B
+Q8_0    :  6.70G, +0.0004 ppl @ LLaMA-v1-7B
+F16     : 13.00G              @ 7B
+```
+Also available on [ollama](https://ollama.com/bramvanroy/fietje-2b).
+Quants were made with release [`b2777`](https://github.com/ggerganov/llama.cpp/releases/tag/b2777) of llama.cpp.