File size: 1,912 Bytes
d92e9bd 52c6fe6 d92e9bd 52c6fe6 d92e9bd 52c6fe6 d92e9bd 2a81ce6 52c6fe6 d92e9bd 52c6fe6 d92e9bd 52c6fe6 d92e9bd 52c6fe6 d92e9bd 52c6fe6 d92e9bd 52c6fe6 d92e9bd 52c6fe6 d92e9bd 52c6fe6 d92e9bd 52c6fe6 d92e9bd 52c6fe6 d92e9bd 52c6fe6 d92e9bd 52c6fe6 d92e9bd 52c6fe6 d92e9bd 52c6fe6 d92e9bd 52c6fe6 d92e9bd 52c6fe6 d92e9bd 52c6fe6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
---
language:
- en
- vi
- zh
base_model:
- google/gemma-2-9b-it
pipeline_tag: text-generation
tags:
- vllm
- system-role
- langchain
license: gemma
---
# gemma-2-9b-it-fix-system-role
Modified version of [gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) and update **`chat_template`** for support **`system`** role to handle cases:
- `Conversation roles must alternate user/assistant/user/assistant/...`
- `System role not supported`
## Model Overview
- **Model Architecture:** Gemma 2
- **Input:** Text
- **Output:** Text
- **Release Date:** 04/12/2024
- **Version:** 1.0
## Deployment
### Use with vLLM
This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend, as shown in the example below.
With CLI:
```bash
vllm serve --model dangvansam/gemma-2-9b-it-fix-system-role
```
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "dangvansam/gemma-2-9b-it-fix-system-role",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who are you?"}
]
}'
```
With Python:
```python
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
model_id = "dangvansam/gemma-2-9b-it-fix-system-role"
sampling_params = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=256)
tokenizer = AutoTokenizer.from_pretrained(model_id)
messages = [
{"role": "system", "content": "You are helpfull assistant."},
{"role": "user", "content": "Who are you?"}
]
prompts = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
llm = LLM(model=model_id)
outputs = llm.generate(prompts, sampling_params)
generated_text = outputs[0].outputs[0].text
print(generated_text)
```
vLLM also supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details. |