chenghenry
/

gemma-2-9b-it-GGUF

Inference Endpoints

Model card Files Files and versions Community

gemma-2-9b-it-GGUF / README.md

chenghenry's picture

Update README.md

b10f354 verified 5 months ago

|

924 Bytes

	---
	license: gemma
	library_name: transformers
	base_model: google/gemma-2-9b-it
	---

	## Usage (llama-cli with GPU):
	```
	llama-cli -m ./gemma-2-9b-it-Q6_K.gguf -ngl 100 --temp 0 --repeat-penalty 1.0 --color -p "Why is the sky blue?"
	```

	## Usage (llama-cli with CPU):
	```
	llama-cli -m ./gemma-2-9b-it-Q6_K.gguf --temp 0 --repeat-penalty 1.0 --color -p "Why is the sky blue?"
	```

	## Usage (llama-cpp-python via Hugging Face Hub):
	```
	from llama_cpp import Llama

	llm = Llama.from_pretrained(
	repo_id="chenghenry/gemma-2-9b-it-GGUF",
	filename="gemma-2-9b-it-Q6_K.gguf",
	n_ctx=8192,
	n_batch=2048,
	n_gpu_layers=100,
	verbose=False,
	chat_format="gemma"
	)

	prompt = "Why is the sky blue?"

	messages = [{"role": "user", "content": prompt}]
	response = llm.create_chat_completion(
	messages=messages,
	repeat_penalty=1.0,
	temperature=0)

	print(response["choices"][0]["message"]["content"])
	```