Full Perplexity Comparison Table for Release Documentation
Quantization Type | PPL(Q) | ln(PPL(Q)/PPL(fp16)) | File Size (G) |
---|---|---|---|
IQ2_S | 25.3893 | 0.501266 | 1.6 |
IQ2_M | 21.6684 | 0.342794 | 1.6 |
Q3_K_M | 16.8567 | 0.091687 | 1.8 |
IQ3_M | 16.774 | 0.086769 | 1.7 |
Q3_K_L | 16.5067 | 0.070705 | 1.8 |
IQ4_NL | 15.9602 | 0.037037 | 1.9 |
IQ4_XS | 15.9591 | 0.036968 | 1.8 |
Q4_K_S | 15.9346 | 0.035431 | 1.9 |
Q4_K_M | 15.8651 | 0.031060 | 2.0 |
Q5_K_S | 15.4901 | 0.007140 | 2.1 |
Q5_K_M | 15.4746 | 0.006139 | 2.2 |
Q6_K | 15.3961 | 0.001053 | 2.4 |
Q8_0 | 15.3831 | 0.000208 | 2.7 |
bf16 | 15.3799 | 0.000000 | 4.2 |
This full table documents all the quantization types tested, showing their respective Perplexity (PPL), ln(PPL(Q)/PPL(fp16)), and file sizes.