File size: 1,912 Bytes
d92e9bd
52c6fe6
 
 
 
 
 
d92e9bd
 
52c6fe6
 
 
 
d92e9bd
 
52c6fe6
d92e9bd
2a81ce6
52c6fe6
 
d92e9bd
52c6fe6
 
 
 
 
 
d92e9bd
52c6fe6
d92e9bd
52c6fe6
d92e9bd
52c6fe6
d92e9bd
52c6fe6
 
 
d92e9bd
52c6fe6
 
 
 
 
 
 
 
 
 
d92e9bd
 
52c6fe6
d92e9bd
52c6fe6
 
d92e9bd
52c6fe6
d92e9bd
52c6fe6
d92e9bd
 
 
52c6fe6
 
 
d92e9bd
 
52c6fe6
d92e9bd
52c6fe6
d92e9bd
52c6fe6
d92e9bd
52c6fe6
 
d92e9bd
 
52c6fe6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
language:
- en
- vi
- zh
base_model:
- google/gemma-2-9b-it
pipeline_tag: text-generation
tags:
- vllm
- system-role
- langchain
license: gemma
---

# gemma-2-9b-it-fix-system-role

Modified version of [gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) and update **`chat_template`** for support **`system`** role to handle cases:
- `Conversation roles must alternate user/assistant/user/assistant/...`
- `System role not supported`

## Model Overview
- **Model Architecture:** Gemma 2
  - **Input:** Text
  - **Output:** Text
- **Release Date:** 04/12/2024
- **Version:** 1.0

## Deployment

### Use with vLLM

This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend, as shown in the example below.

With CLI:
```bash
vllm serve --model dangvansam/gemma-2-9b-it-fix-system-role
```
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "model": "dangvansam/gemma-2-9b-it-fix-system-role",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who are you?"}
  ]
}'
```

With Python:
```python
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer

model_id = "dangvansam/gemma-2-9b-it-fix-system-role"

sampling_params = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=256)

tokenizer = AutoTokenizer.from_pretrained(model_id)

messages = [
  {"role": "system", "content": "You are helpfull assistant."},
  {"role": "user", "content": "Who are you?"}
]

prompts = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

llm = LLM(model=model_id)

outputs = llm.generate(prompts, sampling_params)

generated_text = outputs[0].outputs[0].text
print(generated_text)
```

vLLM also supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.