|
--- |
|
language: |
|
- en |
|
- vi |
|
- zh |
|
base_model: |
|
- google/gemma-2-2b-it |
|
pipeline_tag: text-generation |
|
tags: |
|
- vllm |
|
- system-role |
|
- langchain |
|
license: gemma |
|
--- |
|
|
|
# gemma-2-2b-it-fix-system-role |
|
|
|
Quantized version of [gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) and update **`chat_template`** for support **`system`** role to handle cases: |
|
- `Conversation roles must alternate user/assistant/user/assistant/...` |
|
- `System role not supported` |
|
|
|
## Model Overview |
|
- **Model Architecture:** Gemma 2 |
|
- **Input:** Text |
|
- **Output:** Text |
|
- **Release Date:** 04/12/2024 |
|
- **Version:** 1.0 |
|
|
|
## Deployment |
|
|
|
### Use with vLLM |
|
|
|
This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend, as shown in the example below. |
|
|
|
With CLI: |
|
```bash |
|
vllm serve --model dangvansam/gemma-2-2b-it-fix-system-role |
|
``` |
|
```bash |
|
curl http://localhost:8000/v1/chat/completions \ |
|
-H "Content-Type: application/json" \ |
|
-d '{ |
|
"model": "dangvansam/gemma-2-2b-it-fix-system-role", |
|
"messages": [ |
|
{"role": "system", "content": "You are a helpful assistant."}, |
|
{"role": "user", "content": "Who are you?"} |
|
] |
|
}' |
|
``` |
|
|
|
With Python: |
|
```python |
|
from vllm import LLM, SamplingParams |
|
from transformers import AutoTokenizer |
|
|
|
model_id = "dangvansam/gemma-2-2b-it-fix-system-role" |
|
|
|
sampling_params = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=256) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
|
messages = [ |
|
{"role": "system", "content": "You are helpfull assistant."}, |
|
{"role": "user", "content": "Who are you?"} |
|
] |
|
|
|
prompts = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
|
llm = LLM(model=model_id) |
|
|
|
outputs = llm.generate(prompts, sampling_params) |
|
|
|
generated_text = outputs[0].outputs[0].text |
|
print(generated_text) |
|
``` |
|
|
|
vLLM also supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details. |