Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,44 @@
|
|
1 |
-
---
|
2 |
-
license: llama3.1
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: llama3.1
|
3 |
+
---
|
4 |
+
|
5 |
+
Experimental .GGUF quants for https://huggingface.co/google/gemma-2-9b-it accordingly to LCPP PR
|
6 |
+
(based on b_3529 and now b_3565 for the newer ones) : https://github.com/ggerganov/llama.cpp/pull/8836
|
7 |
+
|
8 |
+
These experimental quant strategies revisiting Ikawrakow's work are displaying a slight decrease of perplexity,
|
9 |
+
including per bpw (from 10%+ for the lowest quants to 0.x% for the highest ones).
|
10 |
+
This is significant enough to encourage you folks to test them, and provide feedback if pertinent.
|
11 |
+
|
12 |
+
The iMatrix I use is based on Group Merged V3 and enriched with a bit of French,
|
13 |
+
a bit of Serbian, and a bit of Croatian languages.
|
14 |
+
|
15 |
+
|
16 |
+
ARC and PPL-512 DATA (Get the last data on the main post of the PR thread) :
|
17 |
+
|
18 |
+
```
|
19 |
+
|
20 |
+
IQ4_XS
|
21 |
+
|
22 |
+
Master
|
23 |
+
Size : 4.13 GiB (4.42 BPW)
|
24 |
+
Arc-C 299 49.16387960
|
25 |
+
Arc-E 570 72.10526316
|
26 |
+
PPL 512 wikitext : 7.5226 +/- 0.04820
|
27 |
+
|
28 |
+
IQ4_XSR
|
29 |
+
|
30 |
+
PR
|
31 |
+
Size :
|
32 |
+
Arc-C 299
|
33 |
+
Arc-E 570
|
34 |
+
PPL 512 wikitext :
|
35 |
+
|
36 |
+
FP16
|
37 |
+
|
38 |
+
MASTER : Gemma 2 9b It F16.
|
39 |
+
Size : 14.96 GiB (16.00 BPW)
|
40 |
+
Arc-C 299 49.49832776
|
41 |
+
Arc-E 570 73.85964912
|
42 |
+
PPL 512 wikitext : 7.3224 +/- 0.04674
|
43 |
+
|
44 |
+
```
|