Update README.md
Browse files
README.md
CHANGED
@@ -29,25 +29,32 @@ widget:
|
|
29 |
|
30 |
This is [Nous Hermes 2 Mistral 7B](https://huggingface.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO), quantized with the help of imatrix so it could offer better performance for being quantized, and have quantization levels available for lower-memory devices to run. [Kalomaze's "groups_merged.txt"](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384) was used for the importance matrix, with context set to 8,192.
|
31 |
|
32 |
-
Here's a chart that provides an approximation of the HellaSwag score (out of 1,000 tasks) and the RAM usage (with `--no-mmap`) with llama.cpp.
|
33 |
-
|Quantization|HellaSwag|256 ctx RAM|512 ctx
|
34 |
|--------|--------|--------|--------|--------|--------|--------|--------|
|
35 |
-
|
|
36 |
-
|
|
37 |
-
|
|
38 |
-
|
|
39 |
-
|
|
40 |
-
|
|
41 |
-
|
|
42 |
-
|
|
43 |
-
|
|
44 |
-
|
|
45 |
-
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
|
52 |
***
|
53 |
|
|
|
29 |
|
30 |
This is [Nous Hermes 2 Mistral 7B](https://huggingface.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO), quantized with the help of imatrix so it could offer better performance for being quantized, and have quantization levels available for lower-memory devices to run. [Kalomaze's "groups_merged.txt"](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384) was used for the importance matrix, with context set to 8,192.
|
31 |
|
32 |
+
Here's a chart that provides an approximation of the HellaSwag score (out of 1,000 tasks) and the RAM usage (with `--no-mmap`) with llama.cpp. The chart is incomplete, and thanks to the randomization of tasks, it may be slightly unprecise:
|
33 |
+
|Quantization|HellaSwag|256 ctx RAM|512 ctx|1024 ctx|2048 ctx|4096 ctx|8192 ctx
|
34 |
|--------|--------|--------|--------|--------|--------|--------|--------|
|
35 |
+
|IQ1_S |51.7% |1.6 GiB |1.6 GiB |1.7 GiB |1.8 GiB |2.0 GiB |2.5 GiB |
|
36 |
+
|IQ2_XXS |72.5% |1.9 GiB |1.9 GiB |2.0 GiB |2.1 GiB |2.4 GiB |2.9 GiB |
|
37 |
+
|IQ2_XS |74.2% |2.1 GiB |2.1 GiB |2.2 GiB |2.3 GiB |2.6 GiB |3.1 GiB |
|
38 |
+
|IQ2_S |76.8% |2.2 GiB |2.2 GiB |2.3 GiB |2.4 GiB |2.7 GiB |3.2 GiB |
|
39 |
+
|Q2_K (original)|77.4%|2.6 GiB|2.6 GiB|2.7 GiB|2.8 GiB|3.1 GiB |3.6 GiB |
|
40 |
+
|Q2_K |78.7% |
|
41 |
+
|IQ3_XXS |79.7% |
|
42 |
+
|IQ3_XS |80.6% |
|
43 |
+
|IQ3_S |81.2% |
|
44 |
+
|IQ3_M |81.1% |
|
45 |
+
|IQ4_XS |82.0% |
|
46 |
+
|IQ4_NL |82.0% |
|
47 |
+
|Q3_K_M (original)|80.0%|3.3 GiB|3.4 GiB|3.4 GiB|3.6 GiB|3.8 GiB|4.3 GiB|
|
48 |
+
|Q3_K_M |80.9%
|
49 |
+
|Q4_K_M (original)|81.8%|4.1 GiB|4.2 GiB|4.2 GiB|4.3 GiB|4.6 GiB|5.1 GiB|
|
50 |
+
|Q4_K_M |81.9%
|
51 |
+
|Q5_K_M (original)|82.1%|4.8 GiB|4.9 GiB|4.9 GiB|5.1 GiB|5.3 GiB|5.8 GiB|
|
52 |
+
|Q5_K_M |81.5% |
|
53 |
+
|Q6_K |81.7% |5.6 GiB |5.6 GiB |5.7 GiB |5.8 GiB |6.1 GiB |6.6 GiB |
|
54 |
+
|
55 |
+
I don't recommend using iq1_S. You may be better off using TinyDolphin-1.1B (HellaSwag: 59.0%) and Dolphin 2.6 Phi-2 (HellaSwag: 71.6%) if you're that limited.
|
56 |
+
|
57 |
+
The original GGUFs can be found at [NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF](https://huggingface.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF). Original model card below.
|
58 |
|
59 |
***
|
60 |
|