Gemma 2 9b Instruction Tuned - GGUF
These are GGUF quants of google/gemma-2-9b-it
Details about the model can be found at the above model page.
Llamacpp Version
These quants were made with llamacpp tag b3408.
If you have problems loading these models, please update your software to se the latest llamacpp version.
Perplexity Scoring
Below are the perplexity scores for the GGUF models. A lower score is better.
Quant Level | Perplexity Score | Standard Deviation |
---|---|---|
F32 | 8.7849 | 0.06498 |
BF16 | 8.7849 | 0.06498 |
Q8_0 | 8.7869 | 0.06500 |
Q6_K | 8.7972 | 0.06510 |
Q5_K_M | 8.7791 | 0.06489 |
Q5_K_S | 8.7899 | 0.06503 |
Q4_K_M | 8.8745 | 0.06575 |
Q4_K_S | 8.9293 | 0.06636 |
Q3_K_L | 9.0210 | 0.06693 |
Q3_K_M | 9.1213 | 0.06784 |
Q3_K_S | 9.1857 | 0.06726 |
Quant Details
This is the script used for quantization.
#!/bin/bash
# Define MODEL_NAME above the loop
MODEL_NAME="gemma-2-9b-it"
# Define the output directory
outputDir="${MODEL_NAME}-GGUF"
# Create the output directory if it doesn't exist
mkdir -p "${outputDir}"
# Make the F32 quant
f32file="${outputDir}/${MODEL_NAME}-F32.gguf"
if [ -f "${f32file}" ]; then
echo "Skipping f32 as ${f32file} already exists."
else
python convert_hf_to_gguf.py "~/src/models/${MODEL_NAME}" --outfile "${f32file}" --outtype "f32"
fi
# Abort out if the F32 didn't work
if [ ! -f "${f32file}" ]; then
echo "No ${f32file} found."
exit 1
fi
# Define the array of quantization strings
quants=("Q8_0" "Q6_K" "Q5_K_M" "Q5_K_S" "Q4_K_M" "Q4_K_S" "Q3_K_L" "Q3_K_M" "Q3_K_S")
# Loop through the quants array
for quant in "${quants[@]}"; do
outfile="${outputDir}/${MODEL_NAME}-${quant}.gguf"
# Check if the outfile already exists
if [ -f "${outfile}" ]; then
echo "Skipping ${quant} as ${outfile} already exists."
else
# Run the command with the current quant string
./llama-quantize "${f32file}" "${outfile}" "${quant}"
echo "Processed ${quant} and generated ${outfile}"
fi
done
- Downloads last month
- 15
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.