dangvansam
/

gemma-2-2b-it-fix-system-role

Text Generation

Model card Files Files and versions Community

gemma-2-2b-it-fix-system-role / README.md

dangvansam's picture

Create README.md

fb016ec verified 19 days ago

|

1.91 kB

	---
	language:
	- en
	- vi
	- zh
	base_model:
	- google/gemma-2-2b-it
	pipeline_tag: text-generation
	tags:
	- vllm
	- system-role
	- langchain
	license: gemma
	---

	# gemma-2-2b-it-fix-system-role

	Quantized version of [gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) and update `chat_template` for support `system` role to handle cases:
	- `Conversation roles must alternate user/assistant/user/assistant/...`
	- `System role not supported`

	## Model Overview
	- Model Architecture: Gemma 2
	- Input: Text
	- Output: Text
	- Release Date: 04/12/2024
	- Version: 1.0

	## Deployment

	### Use with vLLM

	This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend, as shown in the example below.

	With CLI:
	```bash
	vllm serve --model dangvansam/gemma-2-2b-it-fix-system-role
	```
	```bash
	curl http://localhost:8000/v1/chat/completions \
	-H "Content-Type: application/json" \
	-d '{
	"model": "dangvansam/gemma-2-2b-it-fix-system-role",
	"messages": [
	{"role": "system", "content": "You are a helpful assistant."},
	{"role": "user", "content": "Who are you?"}
	]
	}'
	```

	With Python:
	```python
	from vllm import LLM, SamplingParams
	from transformers import AutoTokenizer

	model_id = "dangvansam/gemma-2-2b-it-fix-system-role"

	sampling_params = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=256)

	tokenizer = AutoTokenizer.from_pretrained(model_id)

	messages = [
	{"role": "system", "content": "You are helpfull assistant."},
	{"role": "user", "content": "Who are you?"}
	]

	prompts = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

	llm = LLM(model=model_id)

	outputs = llm.generate(prompts, sampling_params)

	generated_text = outputs[0].outputs[0].text
	print(generated_text)
	```

	vLLM also supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.