File size: 2,551 Bytes
ba0be9b d5ddcbc ba0be9b d5ddcbc ba0be9b d5ddcbc ba0be9b d5ddcbc ba0be9b d5ddcbc ba0be9b d5ddcbc ba0be9b d5ddcbc ba0be9b d5ddcbc ba0be9b d5ddcbc ba0be9b d5ddcbc ba0be9b d5ddcbc ba0be9b d5ddcbc ba0be9b d5ddcbc ba0be9b d5ddcbc ba0be9b d5ddcbc ba0be9b d5ddcbc ba0be9b d5ddcbc ba0be9b d5ddcbc ba0be9b d5ddcbc ba0be9b d5ddcbc ba0be9b d5ddcbc ba0be9b d5ddcbc ba0be9b d5ddcbc ba0be9b d5ddcbc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
---
language:
- ru
base_model: t-tech/T-pro-it-1.0
tags:
- vllm
- bnb
- bitsandbytes
- 8bit
---
# vitekkor/T-pro-it-1.0-bnb-8bit
This model is an 8-bit quantization of model [`t-tech/T-pro-it-1.0`](https://huggingface.co/t-tech/T-pro-it-1.0) using bitsandbytes.
Refer to the [original model card](https://huggingface.co/t-tech/T-pro-it-1.0) for more details on the model.
## Use with transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
MODEL_NAME = "vitekkor/T-pro-it-1.0-bnb-8bit"
model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
prompt = "Напиши стих про машинное обучение"
messages = [
{"role": "system", "content": "Ты T-pro, виртуальный ассистент в Т-Технологии. Твоя задача - быть полезным диалоговым ассистентом."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=256
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```
## Use with vllm
### Python
```bash
pip install vllm
```
```python
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
MODEL_NAME = "vitekkor/T-pro-it-1.0-bnb-8bit"
tokenizer = AutoTokenizer.from_pretrained(model_name)
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
llm = LLM(model=MODEL_NAME, max_model_len=8192)
prompt = "Напиши стих про машинное обучение"
messages = [
{"role": "system", "content": "Ты T-pro, виртуальный ассистент в Т-Технологии. Твоя задача - быть полезным диалоговым ассистентом."},
{"role": "user", "content": prompt}
]
prompt_token_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)
generated_text = [output.outputs[0].text for output in outputs]
print(generated_text)
```
### Server:
```bash
vllm serve vitekkor/T-pro-it-1.0-bnb-8bit
```
|