JaaackXD
/

Llama-3-8B-Instruct-GGUF

Inference Endpoints

Model card Files Files and versions Community

JaaackXD commited on May 11

Commit

a0ff966

•

1 Parent(s): dad05ff

Update README.md

Files changed (1) hide show

README.md +31 -0

README.md CHANGED Viewed

@@ -14,6 +14,37 @@ Including the original LLaMA 3 models file cloning from the Meta HF repo. (https
 If you have issues downloading the models from Meta or converting models for `llama.cpp`, feel free to download this one!
 Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model [README](https://github.com/meta-llama/llama3). For more technical information about generation parameters and recipes for how to use Llama 3 in applications, please go [here](https://github.com/meta-llama/llama-recipes).
 ## License

 If you have issues downloading the models from Meta or converting models for `llama.cpp`, feel free to download this one!
+## Perplexity table on LLaMA 3 70B
+Less perplexity is better. (credit to: [dranger003](https://github.com/ggerganov/llama.cpp/pull/6745#issuecomment-2093892514))
+| Quantization | Size (GiB) | Perplexity (wiki.test) | Delta (FP16)|
+|--------------|------------|------------------------|-------------|
+| IQ1_S        | 14.29      | 9.8655 +/- 0.0625      | 248.51%     |
+| IQ1_M        | 15.60      | 8.5193 +/- 0.0530      | 201.94%     |
+| IQ2_XXS      | 17.79      | 6.6705 +/- 0.0405      | 135.64%     |
+| IQ2_XS       | 19.69      | 5.7486 +/- 0.0345      | 103.07%     |
+| IQ2_S        | 20.71      | 5.5215 +/- 0.0318      | 95.05%      |
+| Q2_K_S       | 22.79      | 5.4334 +/- 0.0325      | 91.94%      |
+| IQ2_M        | 22.46      | 4.8959 +/- 0.0276      | 72.35%      |
+| Q2_K         | 24.56      | 4.7763 +/- 0.0274      | 68.73%      |
+| IQ3_XXS      | 25.58      | 3.9671 +/- 0.0211      | 40.14%      |
+| IQ3_XS       | 27.29      | 3.7210 +/- 0.0191      | 31.45%      |
+| Q3_K_S       | 28.79      | 3.6502 +/- 0.0192      | 28.95%      |
+| IQ3_S        | 28.79      | 3.4698 +/- 0.0174      | 22.57%      |
+| IQ3_M        | 29.74      | 3.4402 +/- 0.0171      | 21.53%      |
+| Q3_K_M       | 31.91      | 3.3617 +/- 0.0172      | 18.75%      |
+| Q3_K_L       | 34.59      | 3.3016 +/- 0.0168      | 16.63%      |
+| IQ4_XS       | 35.30      | 3.0310 +/- 0.0149      | 7.07%       |
+| IQ4_NL       | 37.30      | 3.0261 +/- 0.0149      | 6.90%       |
+| Q4_K_S       | 37.58      | 3.0050 +/- 0.0148      | 6.15%       |
+| Q4_K_M       | 39.60      | 2.9674 +/- 0.0146      | 4.83%       |
+| Q5_K_S       | 45.32      | 2.8843 +/- 0.0141      | 1.89%       |
+| Q5_K_M       | 46.52      | 2.8656 +/- 0.0139      | 1.23%       |
+| Q6_K         | 53.91      | 2.8441 +/- 0.0138      | 0.47%       |
+| Q8_0         | 69.83      | 2.8316 +/- 0.0138      | 0.03%       |
+| F16          | 131.43     | 2.8308 +/- 0.0138      | 0.00%       |
 Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model [README](https://github.com/meta-llama/llama3). For more technical information about generation parameters and recipes for how to use Llama 3 in applications, please go [here](https://github.com/meta-llama/llama-recipes).
 ## License