dangvansam's picture
Create README.md
fb016ec verified
|
raw
history blame
1.91 kB
---
language:
- en
- vi
- zh
base_model:
- google/gemma-2-2b-it
pipeline_tag: text-generation
tags:
- vllm
- system-role
- langchain
license: gemma
---
# gemma-2-2b-it-fix-system-role
Quantized version of [gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) and update **`chat_template`** for support **`system`** role to handle cases:
- `Conversation roles must alternate user/assistant/user/assistant/...`
- `System role not supported`
## Model Overview
- **Model Architecture:** Gemma 2
- **Input:** Text
- **Output:** Text
- **Release Date:** 04/12/2024
- **Version:** 1.0
## Deployment
### Use with vLLM
This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend, as shown in the example below.
With CLI:
```bash
vllm serve --model dangvansam/gemma-2-2b-it-fix-system-role
```
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "dangvansam/gemma-2-2b-it-fix-system-role",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who are you?"}
]
}'
```
With Python:
```python
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
model_id = "dangvansam/gemma-2-2b-it-fix-system-role"
sampling_params = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=256)
tokenizer = AutoTokenizer.from_pretrained(model_id)
messages = [
{"role": "system", "content": "You are helpfull assistant."},
{"role": "user", "content": "Who are you?"}
]
prompts = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
llm = LLM(model=model_id)
outputs = llm.generate(prompts, sampling_params)
generated_text = outputs[0].outputs[0].text
print(generated_text)
```
vLLM also supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.