fedric95/Meta-Llama-3.1-8B-GGUF

Llamacpp Quantizations of Meta-Llama-3.1-8B

Using llama.cpp release b3472 for quantization.

Filename	Quant type	File Size	Perplexity (wikitext-2-raw-v1.test)
Meta-Llama-3.1-8B-BF16.gguf	BF16	16.10GB	6.4006 +/- 0.03938
Meta-Llama-3.1-8B-FP16.gguf	FP16	16.10GB	6.4016 +/- 0.03939
Meta-Llama-3.1-8B-Q8_0.gguf	Q8_0	8.54GB	6.4070 +/- 0.03941
Meta-Llama-3.1-8B-Q6_K.gguf	Q6_K	6.60GB	6.4231 +/- 0.03957
Meta-Llama-3.1-8B-Q5_K_M.gguf	Q5_K_M	5.73GB	6.4623 +/- 0.03987
Meta-Llama-3.1-8B-Q5_K_S.gguf	Q5_K_S	5.60GB	6.5161 +/- 0.04028
Meta-Llama-3.1-8B-Q4_K_M.gguf	Q4_K_M	4.92GB	6.5837 +/- 0.04068
Meta-Llama-3.1-8B-Q4_K_S.gguf	Q4_K_S	4.69GB	6.6751 +/- 0.04125
Meta-Llama-3.1-8B-Q3_K_L.gguf	Q3_K_L	4.32GB	6.9458 +/- 0.04329
Meta-Llama-3.1-8B-Q3_K_M.gguf	Q3_K_M	4.02GB	7.0488 +/- 0.04384
Meta-Llama-3.1-8B-Q3_K_S.gguf	Q3_K_S	3.66GB	7.8823 +/- 0.04920
Meta-Llama-3.1-8B-Q2_K.gguf	Q2_K	3.18GB	9.7262 +/- 0.06393

Results have been computed using:

First, make sure you have hugginface-cli installed:

pip install -U "huggingface_hub[cli]"

Then, you can target the specific file you want:

huggingface-cli download fedric95/Meta-Llama-3.1-8B-GGUF --include "Meta-Llama-3.1-8B-Q4_K_M.gguf" --local-dir ./

If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:

huggingface-cli download fedric95/Meta-Llama-3.1-8B-GGUF --include "Meta-Llama-3.1-8B-Q8_0.gguf/*" --local-dir Meta-Llama-3.1-8B-Q8_0

You can either specify a new local-dir (Meta-Llama-3.1-8B-Q8_0) or download them all in place (./)