Crataco
/

Nous-Hermes-2-Mistral-7B-DPO-imatrix-GGUF

@@ -29,25 +29,32 @@ widget:
 This is [Nous Hermes 2 Mistral 7B](https://huggingface.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO), quantized with the help of imatrix so it could offer better performance for being quantized, and have quantization levels available for lower-memory devices to run. [Kalomaze's "groups_merged.txt"](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384) was used for the importance matrix, with context set to 8,192.
-Here's a chart that provides an approximation of the HellaSwag score (out of 1,000 tasks) and the RAM usage (with `--no-mmap`) with llama.cpp.
-|Quantization|HellaSwag|256 ctx RAM|512 ctx RAM|1024 ctx RAM|2048 ctx RAM|4096 ctx RAM|8192 ctx RAM
 |--------|--------|--------|--------|--------|--------|--------|--------|
-|iq1_S (imatrix)|51.7%   |1.6 GiB |1.6 GiB |1.7 GiB |1.8 GiB |2.0 GiB |2.5 GiB |
-|iq2_XXS (imatrix)|72.5%   |1.9 GiB |1.9 GiB |2.0 GiB |2.1 GiB |2.4 GiB |2.9 GiB |
-|iq2_XS (imatrix)|74.2%   |2.1 GiB |2.1 GiB |2.2 GiB |2.3 GiB |2.6 GiB |3.1 GiB |
-|iq2_S (imatrix)|76.8%   |2.2 GiB |2.2 GiB |2.3 GiB |2.4 GiB |2.7 GiB |3.2 GiB |
-|q2_K    |77.4%   |2.6 GiB |2.6 GiB |2.7 GiB |2.8 GiB |3.1 GiB |3.6 GiB |
-|q2_K (imatrix)|78.7%
-|q3_K_M  |80.0%   |3.3 GiB |3.4 GiB |3.4 GiB |3.6 GiB |3.8 GiB |4.3 GiB |
-|q4_K_M  |81.8%   |4.1 GiB |4.2 GiB |4.2 GiB |4.3 GiB |4.6 GiB |5.1 GiB |
-|q5_K_M  |82.1%   |4.8 GiB |4.9 GiB |4.9 GiB |5.1 GiB |5.3 GiB |5.8 GiB |
-|q6_K    |81.7%   |5.6 GiB |5.6 GiB |5.7 GiB |5.8 GiB |6.1 GiB |6.6 GiB |
-I don't recommend using iq1_S as it gives a BIG drop in quality yet scores worse than lighter models at Q4_K_M like TinyDolphin-1.1B (HellaSwag: 59.0%) and Dolphin 2.6 Phi-2 (HellaSwag: 71.6%).
-Rest of the quants may be added to the chart later.
-Other GGUFs can be found at [NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF](https://huggingface.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF). Original model card below.
 ***

 This is [Nous Hermes 2 Mistral 7B](https://huggingface.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO), quantized with the help of imatrix so it could offer better performance for being quantized, and have quantization levels available for lower-memory devices to run. [Kalomaze's "groups_merged.txt"](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384) was used for the importance matrix, with context set to 8,192.
+Here's a chart that provides an approximation of the HellaSwag score (out of 1,000 tasks) and the RAM usage (with `--no-mmap`) with llama.cpp. The chart is incomplete, and thanks to the randomization of tasks, it may be slightly unprecise:
+|Quantization|HellaSwag|256 ctx RAM|512 ctx|1024 ctx|2048 ctx|4096 ctx|8192 ctx
 |--------|--------|--------|--------|--------|--------|--------|--------|
+|IQ1_S   |51.7%   |1.6 GiB |1.6 GiB |1.7 GiB |1.8 GiB |2.0 GiB |2.5 GiB |
+|IQ2_XXS |72.5%   |1.9 GiB |1.9 GiB |2.0 GiB |2.1 GiB |2.4 GiB |2.9 GiB |
+|IQ2_XS  |74.2%   |2.1 GiB |2.1 GiB |2.2 GiB |2.3 GiB |2.6 GiB |3.1 GiB |
+|IQ2_S   |76.8%   |2.2 GiB |2.2 GiB |2.3 GiB |2.4 GiB |2.7 GiB |3.2 GiB |
+|Q2_K (original)|77.4%|2.6 GiB|2.6 GiB|2.7 GiB|2.8 GiB|3.1 GiB |3.6 GiB |
+|Q2_K    |78.7%   |
+|IQ3_XXS |79.7%   |
+|IQ3_XS  |80.6%   |
+|IQ3_S   |81.2%   |
+|IQ3_M   |81.1%   |
+|IQ4_XS  |82.0%   |
+|IQ4_NL  |82.0%   |
+|Q3_K_M (original)|80.0%|3.3 GiB|3.4 GiB|3.4 GiB|3.6 GiB|3.8 GiB|4.3 GiB|
+|Q3_K_M  |80.9%
+|Q4_K_M (original)|81.8%|4.1 GiB|4.2 GiB|4.2 GiB|4.3 GiB|4.6 GiB|5.1 GiB|
+|Q4_K_M  |81.9%
+|Q5_K_M (original)|82.1%|4.8 GiB|4.9 GiB|4.9 GiB|5.1 GiB|5.3 GiB|5.8 GiB|
+|Q5_K_M  |81.5%   |
+|Q6_K    |81.7%   |5.6 GiB |5.6 GiB |5.7 GiB |5.8 GiB |6.1 GiB |6.6 GiB |
+I don't recommend using iq1_S. You may be better off using TinyDolphin-1.1B (HellaSwag: 59.0%) and Dolphin 2.6 Phi-2 (HellaSwag: 71.6%) if you're that limited.
+The original GGUFs can be found at [NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF](https://huggingface.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF). Original model card below.
 ***