Model
- Quantized Gemma 2 27B Instruction Tuned with IQ3_M
- Fit a single T4 (16GB)
Usage (llama-cli with GPU):
llama-cli -m ./gemma-2-27b-it-IQ3_M.gguf -ngl 42 --temp 0 --repeat-penalty 1.0 --color -p "Why is the sky blue?"
Usage (llama-cli with CPU):
llama-cli -m ./gemma-2-27b-it-IQ3_M.gguf --temp 0 --repeat-penalty 1.0 --color -p "Why is the sky blue?"
Usage (llama-cpp-python via Hugging Face Hub):
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="chenghenry/gemma-2-27b-it-GGUF ",
filename="gemma-2-27b-it-IQ3_M.gguf",
n_ctx=8192,
n_batch=2048,
n_gpu_layers=100,
verbose=False,
chat_format="gemma"
)
prompt = "Why is the sky blue?"
messages = [{"role": "user", "content": prompt}]
response = llm.create_chat_completion(
messages=messages,
repeat_penalty=1.0,
temperature=0)
print(response["choices"][0]["message"]["content"])