Update README.md
Browse files
README.md
CHANGED
@@ -14,6 +14,37 @@ Including the original LLaMA 3 models file cloning from the Meta HF repo. (https
|
|
14 |
|
15 |
If you have issues downloading the models from Meta or converting models for `llama.cpp`, feel free to download this one!
|
16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model [README](https://github.com/meta-llama/llama3). For more technical information about generation parameters and recipes for how to use Llama 3 in applications, please go [here](https://github.com/meta-llama/llama-recipes).
|
18 |
|
19 |
## License
|
|
|
14 |
|
15 |
If you have issues downloading the models from Meta or converting models for `llama.cpp`, feel free to download this one!
|
16 |
|
17 |
+
## Perplexity table on LLaMA 3 70B
|
18 |
+
|
19 |
+
Less perplexity is better. (credit to: [dranger003](https://github.com/ggerganov/llama.cpp/pull/6745#issuecomment-2093892514))
|
20 |
+
|
21 |
+
| Quantization | Size (GiB) | Perplexity (wiki.test) | Delta (FP16)|
|
22 |
+
|--------------|------------|------------------------|-------------|
|
23 |
+
| IQ1_S | 14.29 | 9.8655 +/- 0.0625 | 248.51% |
|
24 |
+
| IQ1_M | 15.60 | 8.5193 +/- 0.0530 | 201.94% |
|
25 |
+
| IQ2_XXS | 17.79 | 6.6705 +/- 0.0405 | 135.64% |
|
26 |
+
| IQ2_XS | 19.69 | 5.7486 +/- 0.0345 | 103.07% |
|
27 |
+
| IQ2_S | 20.71 | 5.5215 +/- 0.0318 | 95.05% |
|
28 |
+
| Q2_K_S | 22.79 | 5.4334 +/- 0.0325 | 91.94% |
|
29 |
+
| IQ2_M | 22.46 | 4.8959 +/- 0.0276 | 72.35% |
|
30 |
+
| Q2_K | 24.56 | 4.7763 +/- 0.0274 | 68.73% |
|
31 |
+
| IQ3_XXS | 25.58 | 3.9671 +/- 0.0211 | 40.14% |
|
32 |
+
| IQ3_XS | 27.29 | 3.7210 +/- 0.0191 | 31.45% |
|
33 |
+
| Q3_K_S | 28.79 | 3.6502 +/- 0.0192 | 28.95% |
|
34 |
+
| IQ3_S | 28.79 | 3.4698 +/- 0.0174 | 22.57% |
|
35 |
+
| IQ3_M | 29.74 | 3.4402 +/- 0.0171 | 21.53% |
|
36 |
+
| Q3_K_M | 31.91 | 3.3617 +/- 0.0172 | 18.75% |
|
37 |
+
| Q3_K_L | 34.59 | 3.3016 +/- 0.0168 | 16.63% |
|
38 |
+
| IQ4_XS | 35.30 | 3.0310 +/- 0.0149 | 7.07% |
|
39 |
+
| IQ4_NL | 37.30 | 3.0261 +/- 0.0149 | 6.90% |
|
40 |
+
| Q4_K_S | 37.58 | 3.0050 +/- 0.0148 | 6.15% |
|
41 |
+
| Q4_K_M | 39.60 | 2.9674 +/- 0.0146 | 4.83% |
|
42 |
+
| Q5_K_S | 45.32 | 2.8843 +/- 0.0141 | 1.89% |
|
43 |
+
| Q5_K_M | 46.52 | 2.8656 +/- 0.0139 | 1.23% |
|
44 |
+
| Q6_K | 53.91 | 2.8441 +/- 0.0138 | 0.47% |
|
45 |
+
| Q8_0 | 69.83 | 2.8316 +/- 0.0138 | 0.03% |
|
46 |
+
| F16 | 131.43 | 2.8308 +/- 0.0138 | 0.00% |
|
47 |
+
|
48 |
Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model [README](https://github.com/meta-llama/llama3). For more technical information about generation parameters and recipes for how to use Llama 3 in applications, please go [here](https://github.com/meta-llama/llama-recipes).
|
49 |
|
50 |
## License
|