Experimental .GGUF quants for https://huggingface.co/google/gemma-2-9b-it accordingly to LCPP PR (based on b_3529 and now b_3565 for the newer ones) : https://github.com/ggerganov/llama.cpp/pull/8836
These experimental quant strategies revisiting Ikawrakow's work are displaying a slight decrease of perplexity, including per bpw (from 10%+ for the lowest quants to 0.x% for the highest ones). This is significant enough to encourage you folks to test them, and provide feedback if pertinent.
The iMatrix I use is based on Group Merged V3 and enriched with a bit of French, a bit of Serbian, and a bit of Croatian languages.
ARC and PPL-512 DATA (Get the last data on the main post of the PR thread) :
IQ1_XS
PR 1 : Gemma 2 9b It IQ1_XS quant made from BF16
Size : 2.15 GiB (2.00 BPW)
Arc-C 299 42.80936455
Arc-E 570 68.24561404
PPL 512 wikitext : 15.1105 +/- 0.11363
PR 2 : Gemma 2 9b It IQ1_XS quant made from BF16
Size : 2.16 GiB (2.01 BPW)
PPL 512 wikitext : 14.9768 +/- 0.11234
PR 3 + IK iMat Fix : Gemma 2 9b It IQ1_XS quant made from BF16
Size : 2.16 GiB (2.01 BPW)
PPL 512 wikitext : 14.9768 +/- 0.11234 (apparently useless)
PR 4
2.34 GB (2.02 BPW) 2.18 GiB (2.02 BPW)
PPL over 569 chunks for n_ctx=512 = 14.5476 +/- 0.10791
IQ1_S
MASTER : Gemma 2 9b It IQ1_S, quant made from BF16
Size : 2.21 GiB (2.05 BPW)
Arc-C 299 42.47491639
Arc-E 570 66.84210526
PPL 512 wikitext : 15.9317 +/- 0.11979
PR 1 : Gemma 2 9b It IQ1_S quant made from BF16
Size : 2.23 GiB (2.07 BPW)
Arc-C 299 43.14381271
Arc-E 570 68.42105263
PPL 512 wikitext : 14.1578 +/- 0.10530
PR 2 : Gemma 2 9b It IQ1_S quant made from BF16
Size : 2.24 GiB (2.08 BPW)
PPL 512 wikitext : 14.0207 +/- 0.10399
PR 3 :
2.43 GB (2.10 BPW) 2.26 GiB (2.10 BPW)
PPL over 569 chunks for n_ctx=512 = 13.6266 +/- 0.10131
IQ1_M
MASTER : Gemma 2 9b It IQ1_M, quant made from BF16
Size : 2.37 GiB (2.20 BPW)
Arc-C 299 45.81939799
Arc-E 570 73.85964912
PPL 512 wikitext : 13.7215 +/- 0.10231
PR 1 : Gemma 2 9b It IQ1_M quant made from BF16
Size : 2.36 GiB (2.19 BPW)
Arc-C 299 45.81939799
Arc-E 570 74.56140351
PPL 512 wikitext : 12.6773 +/- 0.09336
PR 2
2.55 GB (2.21 BPW) 2.37 GiB (2.21 BPW)
PPL over 569 chunks for n_ctx=512 = 12.4616 +/- 0.09246
IQ1_XL
PR 1 : Gemma 2 9b It IQ1_XL quant made from BF16
Size : 2.48 GiB (2.30 BPW)
Arc-C 299 47.49163880
Arc-E 570 73.33333333
PPL 512 wikitext : 11.5001 +/- 0.08487
PR 2 : Gemma 2 9b It IQ1_XL quant made from BF16
Size : 2.47 GiB (2.29 BPW)
PPL 512 wikitext : 11.4824 +/- 0.08451
PR 3 :
2.69 GB (2.33 BPW) 2.51 GiB (2.33 BPW)
PPL over 569 chunks for n_ctx=512 = 11.1759 +/- 0.08205
IQ2_XXS
MASTER : Gemma 2 9b It IQ2_XXS, quant made from BF16
Size : 2.63 GiB (2.44 BPW)
Arc-C 299 48.16053512
Arc-E 570 73.15789474
PPL 512 wikitext : 11.2527 +/- 0.08307
PR 1 : Gemma 2 9b It IQ2_XXS, quant made from BF16
Size : 2.73 GiB (2.54 BPW)
Arc-C 299 48.82943144
Arc-E 570 74.56140351
PPL 512 wikitext : 10.8439 +/- 0.08026
PR 2 : Gemma 2 9b It IQ2_XXS, quant made from BF16
Size : 2.72 GiB (2.53 BPW)
PPL 512 wikitext : 10.8173 +/- 0.07986
PR 3 : Gemma 2 9b It IQ2_XXS, quant made from BF16
Size : 2.62 GiB (2.43 BPW)
PPL 512 wikitext : 10.8388 +/- 0.08010
PR 4 : 2.86 GB (2.48 BPW) 2.67 GiB (2.48 BPW)
Final estimate: PPL over 569 chunks for n_ctx=512 = 10.6688 +/- 0.07861
IQ2_XS
MASTER : Gemma 2 9b It IQ2_XS, quant made from BF16
Size : 2.85 GiB (2.65 BPW)
Arc-C 299 49.49832776
Arc-E 570 78.24561404
PPL 512 wikitext : 10.5698 +/- 0.07803
PR Init : Gemma 2 9b It IQ2_XS, quant made from BF16
Size : 2.91 GiB (2.70 BPW)
Arc-C 299 49.16387960
Arc-E 570 78.59649123
PPL 512 wikitext : 10.3607 +/- 0.07660
PR 2 : Gemma 2 9b It IQ2_XS, quant made from BF16
Size : 2.77 GiB (2.58 BPW)
PPL 512 wikitext : 10.3922 +/- 0.07672
PR 3 : 3.09 GB (2.67 BPW) 2.87 GiB (2.67 BPW)
PPL over 569 chunks for n_ctx=512 = 10.0753 +/- 0.07438
IQ2_S
MASTER : Gemma 2 9b It IQ2_S (with iMatrix, attn_output and attn.v in IQ3_S), quant made from BF16
Size : 2.99 GiB (2.77 BPW)
Arc-C 299 52.84280936
Arc-E 570 77.54385965
PPL 512 wikitext : 10.3868 +/- 0.07787
PR Int : Gemma 2 9b It IQ2_S (with Imatrix, attn_output in IQ3_XXS, and attn_v in Q4_K), quant made from BF16
Size : 3.00 GiB (2.79 BPW)
Arc-C 299 49.83277592
Arc-E 570 77.71929825
PPL 512 wikitext : 10.1303 +/- 0.07486
PR 2 : Gemma 2 9b It IQ2_S, quant made from BF16
Size : 3.00 GiB (2.79 BPW)
Arc-C 299 52.17391304
Arc-E 570 77.89473684
PPL 512 wikitext : 10.1071 +/- 0.07450
PR 3 : 3.21 GB (2.78 BPW) 2.99 GiB (2.78 BPW)
PPL over 569 chunks for n_ctx=512 = 9.8175 +/- 0.07239
IQ2_M
MASTER : Gemma 2 9b It IQ2_M (with iMatrix, attn_output and attn.v in IQ3_S), quant made from BF16
Size : 3.19 GiB (2.97 BPW)
Arc-C 299 56.52173913
Arc-E 570 77.01754386
PPL 512 wikitext : 9.8154 +/- 0.07324
PR init : Gemma 2 9b It IQ2_M (with Imatrix, attn_output in IQ3_XXS, and attn_v in Q4_K), quant made from BF16
Size : 3.20 GiB (2.98 BPW)
Arc-C 299 54.18060201
Arc-E 570 78.07017544
PPL 512 wikitext : 9.5734 +/- 0.07040
PR 2 : Gemma 2 9b It IQ2_M, quant made from BF16
Size : 3.29 GiB (3.06 BPW)
Arc-C 299 55.85284281
Arc-E 570 78.07017544
PPL 512 wikitext : 9.4128 +/- 0.06881
PR 3 : 3.42 GB (2.96 BPW) 3.19 GiB (2.96 BPW)
PPL over 569 chunks for n_ctx=512 = 9.4207 +/- 0.06894
IQ2_XL
PR Init : Gemma 2 9b It IQ2_XL, quant made from BF16
Size : 3.41 GiB (3.17 BPW)
Arc-C 299 56.18729097
Arc-E 570 78.07017544
PPL 512 wikitext : 9.3283 +/- 0.06820
PR 2 : 3.63 GB (3.14 BPW) 3.38 GiB (3.14 BPW)
PPL over 569 chunks for n_ctx=512 = 9.2667 +/- 0.06814
Q2_K_L
PR CURRENT : Gemma 2 9b It Q2_K_L, quant made from BF16
Size : 3.70 GiB (3.44 BPW)
Arc-C 299 58.19397993
Arc-E 570 79.29824561
PPL 512 wikitext : around 9.25
IQ3_XXS
MASTER : Gemma 2 9b It IQ3_XXS (with iMatrix, attn_k in IQ2_S, and attn_v in IQ3_XXS), quant made from BF16
Size : 3.53 GiB (3.28 BPW)
Arc-C 299 56.52173913
Arc-E 570 79.12280702
PPL 512 wikitext : 9.4116 +/- 0.06982
PR CURRENT : Gemma 2 9b It IQ3_XXS (with Imatrix, attn_k in IQ3_XXS, and attn_v in Q4_K), quant made from BF16
Size : 3.60 GiB (3.35 BPW)
Arc-C 299 56.18729097
Arc-E 570 78.77192982
PPL 512 wikitext : 9.2026 +/- 0.06781
IQ3_XS
MASTER : Gemma 2 9b It IQ3_XS (with iMatrix)), quant made from BF16
Size : 3.85 GiB (3.58 BPW)
Arc-C 299 58.86287625
Arc-E 570 78.94736842
PPL 512 wikitext : 9.2584 +/- 0.06866
PR 1 : Gemma 2 9b It IQ3_XS (with Imatrix), quant made from BF16
Size : 3.82 GiB (3.55 BPW)
Arc-C 299 57.19063545
Arc-E 570 78.07017544
PPL 512 wikitext : 9.0658 +/- 0.06633
PR 2 :
4.16 GB (3.60 BPW)
3.88 GiB (3.60 BPW)
PPL over 569 chunks for n_ctx=512 = 8.9976 +/- 0.06585
IQ3_S
MASTER : Gemma 2 9b It IQ3_S (with iMatrix, attn_v in IQ3_S), quant made from BF16
Size : 4.03 GiB (3.75 BPW)
Arc-C 299 57.52508361
Arc-E 570 77.71929825
PPL 512 wikitext : 9.2100 +/- 0.06859
PR : Gemma 2 9b It IQ3_S (with Imatrix, attn_v in Q4_K), quant made from BF16
Size : 4.07 GiB (3.79 BPW)
Arc-C 299 57.19063545
Arc-E 570 78.07017544
PPL 512 wikitext : 9.0082 +/- 0.06633
PR rev 2: Gemma 2 9b It IQ3_S (with Imatrix), quant made from BF16
Size : 4.07 GiB (3.79 BPW)
Arc-C 299 56.85618729
Arc-E 570 78.42105263
PPL 512 wikitext : 9.0082 +/- 0.06633
(I think ARC differences are due to the b3565 merge)
PR rev3 - CURRENT: Gemma 2 9b It IQ3_S (with Imatrix), quant made from BF16
Size : 4.05 GiB (3.76 BPW)
Arc-C 299 57.52508361
Arc-E 570 78.42105263
PPL 512 wikitext : 8.9969 +/- 0.06610
PR4
4.35 GB (3.76 BPW)
4.05 GiB (3.76 BPW)
PPL over 569 chunks for n_ctx=512 = 8.9734 +/- 0.06584
IQ3_M
MASTER : Gemma 2 9b It IQ3_M (with iMatrix, attn_output in Q4_K), quant made from BF16
Size : 4.18 GiB (3.89 BPW)
Arc-C 299 56.85618729
Arc-E 570 77.71929825
PPL 512 wikitext : 8.9697 +/- 0.06598
PR : Gemma 2 9b It IQ3_M (with Imatrix, attn_output in IQ4_XS), quant made from BF16
Size : 4.16 GiB (3.87 BPW)
Arc-C 299 57.19063545
Arc-E 570 77.71929825
PPL 512 wikitext : 8.9556 +/- 0.06586
PR rev2 : Gemma 2 9b It IQ3_M (with Imatrix, attn_output in IQ4_XS, attn.v Q5_K), quant made from BF16
Size : 4.20 GiB (3.90 BPW)²
Arc-C 299 58.52842809²
Arc-E 570 77.54385965²
PPL 512 wikitext : 8.9445 +/- 0.06576²
PR rev3 - CURRENT : Gemma 2 9b It IQ3_M (with Imatrix, attn_output in IQ4_XS, attn.v Q5_K, attn.k IQ4_XS), quant made from BF16
Size : 4.23 GiB (3.93 BPW)
Arc-C 299 58.19397993
Arc-E 570 77.19298246
PPL 512 wikitext : 8.9082 +/- 0.06536
IQ3_XL
PR CURRENT : Gemma 2 9b It IQ3_XL (with Imatrix), quant made from BF16
Size : 4.50 GiB (4.18 BPW)
Arc-C 299 56.85618729
Arc-E 570 78.42105263
PPL 512 wikitext : 8.8843 +/- 0.06558
IQ4_XS
MASTER : Gemma 2 9b It IQ4_XS (with iMatrix,), quant made from BF16
Size : 4.87 GiB (4.52 BPW)
Arc-C 299 57.52508361
Arc-E 570 78.24561404
PPL 512 wikitext : 8.8456 +/- 0.06533
PR CURRENT : Gemma 2 9b It IQ4_XS (with iMatrix,), quant made from BF16
Size : 4.91 GiB (4.56 BPW)
PPL 512 wikitext : 8.8370 +/- 0.06525
Q4_K_M
Master
Size : 5.13 GiB (4.77 BPW)
PPL 512 wikitext : 8.8367 +/- 0.06523
Q4_K_M
Master
Size : 5.40 GiB (5.02 BPW)
PPL 512 wikitext : 8.8054 +/- 0.06487
Q5_K_S
Master
Size : 6.03 GiB (5.61 BPW)
PPL 512 wikitext : 8.8067 +/- 0.06511
Q5_K_M
Master
Size : 6.19 GiB (5.75 BPW)
PPL 512 wikitext : 8.7973 +/- 0.06502
FP16
MASTER : Gemma 2 9b It F16.
Size : 17.22 GiB (16.00 BPW)
Arc-C 299 59.53177258
Arc-E 570 78.77192982
PPL 512 wikitext : 8.7881 +/- 0.06533
- Downloads last month
- 590