Dracones
/

gemma-2-27b-it-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

gemma-2-27b-it-GGUF / README.md

Dracones's picture

Upload folder using huggingface_hub

de9c6fb verified 7 months ago

|

history blame contribute delete

2.3 kB

	---
	license: gemma
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- conversational
	- gguf
	- llamacpp
	---



	# Gemma 2 27b Instruction Tuned - GGUF

	These are GGUF quants of [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)

	Details about the model can be found at the above model page.

	## Llamacpp Version

	These quants were made with llamacpp tag b3408.

	If you have problems loading these models, please update your software to se the latest llamacpp version.


	## Perplexity Scoring

	Below are the perplexity scores for the GGUF models. A lower score is better.

	\| Quant Level \| Perplexity Score \| Standard Deviation \|
	\|-------------\|------------------\|--------------------\|
	\| F32 \| 7.1853 \| 0.04922 \|
	\| BF16 \| 7.1853 \| 0.04922 \|
	\| Q8_0 \| 7.1879 \| 0.04924 \|
	\| Q6_K \| 7.2182 \| 0.04948 \|
	\| Q5_K_M \| 7.2333 \| 0.04953 \|
	\| Q5_K_S \| 7.2204 \| 0.04931 \|
	\| Q4_K_M \| 7.4192 \| 0.05149 \|
	\| Q4_K_S \| 7.5403 \| 0.05231 \|
	\| Q3_K_L \| 7.4623 \| 0.05128 \|
	\| Q3_K_M \| 7.7375 \| 0.05362 \|
	\| Q3_K_S \| 8.0426 \| 0.05546 \|


	## Quant Details

	This is the script used for quantization.

	```bash
	#!/bin/bash

	# Define MODEL_NAME above the loop
	MODEL_NAME="gemma-2-27b-it"

	# Define the output directory
	outputDir="${MODEL_NAME}-GGUF"

	# Create the output directory if it doesn't exist
	mkdir -p "${outputDir}"

	# Make the F32 quant
	f32file="${outputDir}/${MODEL_NAME}-F32.gguf"
	if [ -f "${f32file}" ]; then
	echo "Skipping f32 as ${f32file} already exists."
	else
	python convert_hf_to_gguf.py "~/src/models/${MODEL_NAME}" --outfile "${f32file}" --outtype "f32"
	fi

	# Abort out if the F32 didn't work
	if [ ! -f "${f32file}" ]; then
	echo "No ${f32file} found."
	exit 1
	fi

	# Define the array of quantization strings
	quants=("Q8_0" "Q6_K" "Q5_K_M" "Q5_K_S" "Q4_K_M" "Q4_K_S" "Q3_K_L" "Q3_K_M" "Q3_K_S")


	# Loop through the quants array
	for quant in "${quants[@]}"; do
	outfile="${outputDir}/${MODEL_NAME}-${quant}.gguf"

	# Check if the outfile already exists
	if [ -f "${outfile}" ]; then
	echo "Skipping ${quant} as ${outfile} already exists."
	else
	# Run the command with the current quant string
	./llama-quantize "${f32file}" "${outfile}" "${quant}"

	echo "Processed ${quant} and generated ${outfile}"
	fi
	done
	```