Nexesenex
/

Meta_Llama-3.1-8b-it_iMat_Custom_Quant_Stategies-GGUF

Model card Files Files and versions Community

Nexesenex commited on Aug 12, 2024

Commit

ab6493f

·

verified ·

1 Parent(s): 3be798b

Update README.md

Files changed (1) hide show

README.md +44 -3

README.md CHANGED Viewed

@@ -1,3 +1,44 @@
----
-license: llama3.1
----

+---
+license: llama3.1
+---
+Experimental .GGUF quants for https://huggingface.co/google/gemma-2-9b-it accordingly to LCPP PR
+(based on b_3529 and now b_3565 for the newer ones) : https://github.com/ggerganov/llama.cpp/pull/8836
+These experimental quant strategies revisiting Ikawrakow's work are displaying a slight decrease of perplexity,
+including per bpw (from 10%+ for the lowest quants to 0.x% for the highest ones).
+This is significant enough to encourage you folks to test them, and provide feedback if pertinent.
+The iMatrix I use is based on Group Merged V3 and enriched with a bit of French,
+a bit of Serbian, and a bit of Croatian languages.
+ARC and PPL-512 DATA (Get the last data on the main post of the PR thread) :
+```
+IQ4_XS
+Master
+Size : 4.13 GiB (4.42 BPW)
+Arc-C 299     49.16387960
+Arc-E 570     72.10526316
+PPL 512 wikitext : 7.5226 +/- 0.04820
+IQ4_XSR
+PR
+Size :
+Arc-C 299
+Arc-E 570
+PPL 512 wikitext :
+FP16
+MASTER : Gemma 2 9b It F16.
+Size : 14.96 GiB (16.00 BPW)
+Arc-C 299     49.49832776
+Arc-E 570     73.85964912
+PPL 512 wikitext : 7.3224 +/- 0.04674
+```