|
--- |
|
license: llama3.1 |
|
--- |
|
|
|
Experimental .GGUF quants for https://huggingface.co/google/gemma-2-9b-it accordingly to LCPP PR |
|
(based on b_3529 and now b_3565 for the newer ones) : https://github.com/ggerganov/llama.cpp/pull/8836 |
|
|
|
These experimental quant strategies revisiting Ikawrakow's work are displaying a slight decrease of perplexity, |
|
including per bpw (from 10%+ for the lowest quants to 0.x% for the highest ones). |
|
This is significant enough to encourage you folks to test them, and provide feedback if pertinent. |
|
|
|
The iMatrix I use is based on Group Merged V3 and enriched with a bit of French, |
|
a bit of Serbian, and a bit of Croatian languages. |
|
|
|
|
|
ARC and PPL-512 DATA (Get the last data on the main post of the PR thread) : |
|
|
|
``` |
|
|
|
IQ4_XS |
|
|
|
Master |
|
Size : 4.13 GiB (4.42 BPW) |
|
Arc-C 299 49.16387960 |
|
Arc-E 570 72.10526316 |
|
PPL 512 wikitext : 7.5226 +/- 0.04820 |
|
|
|
IQ4_XSR |
|
|
|
PR |
|
Size : 4.16 GiB (4.45 BPW) |
|
Arc-C 299 |
|
Arc-E 570 |
|
PPL 512 wikitext : 7.5072 +/- 0.04814 |
|
|
|
FP16 |
|
|
|
MASTER : Gemma 2 9b It F16. |
|
Size : 14.96 GiB (16.00 BPW) |
|
Arc-C 299 49.49832776 |
|
Arc-E 570 73.85964912 |
|
PPL 512 wikitext : 7.3224 +/- 0.04674 |
|
|
|
``` |