Edit model card

Llamacpp Quantizations of Meta-Llama-3.1-8B

Using llama.cpp release b3472 for quantization.

Original model: https://huggingface.co/meta-llama/Meta-Llama-3.1-8B

Download a file (not the whole branch) from below:

Filename Quant type File Size Perplexity (wikitext-2-raw-v1.test)
Meta-Llama-3.1-8B-BF16.gguf BF16 16.10GB 6.4006 +/- 0.03938
Meta-Llama-3.1-8B-FP16.gguf FP16 16.10GB 6.4016 +/- 0.03939
Meta-Llama-3.1-8B-Q8_0.gguf Q8_0 8.54GB 6.4070 +/- 0.03941
Meta-Llama-3.1-8B-Q6_K.gguf Q6_K 6.60GB 6.4231 +/- 0.03957
Meta-Llama-3.1-8B-Q5_K_M.gguf Q5_K_M 5.73GB 6.4623 +/- 0.03987
Meta-Llama-3.1-8B-Q5_K_S.gguf Q5_K_S 5.60GB 6.5161 +/- 0.04028
Meta-Llama-3.1-8B-Q4_K_M.gguf Q4_K_M 4.92GB 6.5837 +/- 0.04068
Meta-Llama-3.1-8B-Q4_K_S.gguf Q4_K_S 4.69GB 6.6751 +/- 0.04125
Meta-Llama-3.1-8B-Q3_K_L.gguf Q3_K_L 4.32GB 6.9458 +/- 0.04329
Meta-Llama-3.1-8B-Q3_K_M.gguf Q3_K_M 4.02GB 7.0488 +/- 0.04384
Meta-Llama-3.1-8B-Q3_K_S.gguf Q3_K_S 3.66GB 7.8823 +/- 0.04920
Meta-Llama-3.1-8B-Q2_K.gguf Q2_K 3.18GB 9.7262 +/- 0.06393

Benchmark Results

Benchmark Quant type Metric
WinoGrande (0-shot) Q8_0 74.1121 +/- 1.2311
WinoGrande (0-shot) Q4_K_M 73.1650 +/- 1.2453
WinoGrande (0-shot) Q3_K_M 72.7703 +/- 1.2511
WinoGrande (0-shot) Q3_K_S 72.3757 +/- 1.2567
WinoGrande (0-shot) Q2_K 68.4294 +/- 1.3063
HellaSwag (0-shot) Q8_0 79.41645091
HellaSwag (0-shot) Q4_K_M 79.05795658
HellaSwag (0-shot) Q3_K_M 79.41645091
HellaSwag (0-shot) Q3_K_S 76.93686517
HellaSwag (0-shot) Q2_K 72.16689902
MMLU (0-shot) Q8_0 39.4703 +/- 1.2427
MMLU (0-shot) Q4_K_M 39.5349 +/- 1.2431
MMLU (0-shot) Q3_K_M 38.8889 +/- 1.2394
MMLU (0-shot) Q3_K_S 37.2739 +/- 1.2294
MMLU (0-shot) Q2_K 35.4651 +/- 1.2163

Downloading using huggingface-cli

First, make sure you have hugginface-cli installed:

pip install -U "huggingface_hub[cli]"

Then, you can target the specific file you want:

huggingface-cli download fedric95/Meta-Llama-3.1-8B-GGUF --include "Meta-Llama-3.1-8B-Q4_K_M.gguf" --local-dir ./

If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:

huggingface-cli download fedric95/Meta-Llama-3.1-8B-GGUF --include "Meta-Llama-3.1-8B-Q8_0.gguf/*" --local-dir Meta-Llama-3.1-8B-Q8_0

You can either specify a new local-dir (Meta-Llama-3.1-8B-Q8_0) or download them all in place (./)

Reproducibility

https://github.com/ggerganov/llama.cpp/issues/8650#issuecomment-2261497976

Downloads last month
366
GGUF
Model size
8.03B params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for fedric95/Meta-Llama-3.1-8B-GGUF

Quantized
this model