File size: 2,965 Bytes
b2f7447
 
 
 
 
 
 
 
 
2ce0f16
9d62d2d
2ce0f16
 
015ae3d
eb5b222
2ce0f16
06fcea7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2ce0f16
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
---
license: llama3
tags:
- llama
- llama-3
- meta
- facebook
- gguf
---

Directly converted and quantized into GGUF based on `llama.cpp` (release tag: b2843) from the 'Mata-Llama-3' repo from Meta on Hugging Face. 

Including the original LLaMA 3 models file cloning from the Meta HF repo. (https://huggingface.co/meta-llama/Meta-Llama-3-8B)

If you have issues downloading the models from Meta or converting models for `llama.cpp`, feel free to download this one!

## Perplexity table on LLaMA 3 70B

Less perplexity is better. (credit to: [dranger003](https://github.com/ggerganov/llama.cpp/pull/6745#issuecomment-2093892514))

| Quantization | Size (GiB) | Perplexity (wiki.test) | Delta (FP16)|
|--------------|------------|------------------------|-------------|
| IQ1_S        | 14.29      | 9.8655 +/- 0.0625      | 248.51%     |
| IQ1_M        | 15.60      | 8.5193 +/- 0.0530      | 201.94%     |
| IQ2_XXS      | 17.79      | 6.6705 +/- 0.0405      | 135.64%     |
| IQ2_XS       | 19.69      | 5.7486 +/- 0.0345      | 103.07%     |
| IQ2_S        | 20.71      | 5.5215 +/- 0.0318      | 95.05%      |
| Q2_K_S       | 22.79      | 5.4334 +/- 0.0325      | 91.94%      |
| IQ2_M        | 22.46      | 4.8959 +/- 0.0276      | 72.35%      |
| Q2_K         | 24.56      | 4.7763 +/- 0.0274      | 68.73%      |
| IQ3_XXS      | 25.58      | 3.9671 +/- 0.0211      | 40.14%      |
| IQ3_XS       | 27.29      | 3.7210 +/- 0.0191      | 31.45%      |
| Q3_K_S       | 28.79      | 3.6502 +/- 0.0192      | 28.95%      |
| IQ3_S        | 28.79      | 3.4698 +/- 0.0174      | 22.57%      |
| IQ3_M        | 29.74      | 3.4402 +/- 0.0171      | 21.53%      |
| Q3_K_M       | 31.91      | 3.3617 +/- 0.0172      | 18.75%      |
| Q3_K_L       | 34.59      | 3.3016 +/- 0.0168      | 16.63%      |
| IQ4_XS       | 35.30      | 3.0310 +/- 0.0149      | 7.07%       |
| IQ4_NL       | 37.30      | 3.0261 +/- 0.0149      | 6.90%       |
| Q4_K_S       | 37.58      | 3.0050 +/- 0.0148      | 6.15%       |
| Q4_K_M       | 39.60      | 2.9674 +/- 0.0146      | 4.83%       |
| Q5_K_S       | 45.32      | 2.8843 +/- 0.0141      | 1.89%       |
| Q5_K_M       | 46.52      | 2.8656 +/- 0.0139      | 1.23%       |
| Q6_K         | 53.91      | 2.8441 +/- 0.0138      | 0.47%       |
| Q8_0         | 69.83      | 2.8316 +/- 0.0138      | 0.03%       |
| F16          | 131.43     | 2.8308 +/- 0.0138      | 0.00%       |

Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model [README](https://github.com/meta-llama/llama3). For more technical information about generation parameters and recipes for how to use Llama 3 in applications, please go [here](https://github.com/meta-llama/llama-recipes). 

## License

See the License file for Meta Llama 3 [here](https://llama.meta.com/llama3/license/) and Acceptable Use Policy [here](https://llama.meta.com/llama3/use-policy/)