---
language:
- ru
---
# T-lite-instruct-0.1
**π¨ T-lite is designed for further fine-tuning and is not intended as a ready-to-use conversational assistant. Users are advised to exercise caution and are responsible for any additional training and oversight required to ensure the model's responses meet acceptable ethical and safety standards. The responsibility for incorporating this model into industrial or commercial solutions lies entirely with those who choose to deploy it.**
## Description
T-lite-instruct-0.1 is an instruct version of the T-lite-0.1 model.
T-lite-instruct-0.1 was trained in bf16.
### π Dataset
#### Contexts
For the instruction dataset, the contexts are obtained from:
- Open Source English-language datasets (such as UltraFeedback, HelpSteer, SHP, and so on)
- Translations of English-language datasets through machine translation
- Synthetic grounded QA contexts, generated from pre-training datasets
The translated contexts are filtered using classifiers.
#### SFT
The responses to the contexts are generated by a strong model and the training is exclusively carried out on these responses. This avoids training the model on poor-quality translations.
#### Reward Modeling
RM is trained on such pairs:
- Strong Model > Our Model
- Stronger Model > Weaker Model
- Chosen Translated Response > Rejected Translated Response
- Pairs from original English datasets
The translated preference data are preliminarily filtered by the RM ensemble.
#### Preference tuning
Two stages were used in preference tuning:
- Stage 1: SPiN on the responses of the teacher model (Strong Model > Our Model)
- Stage 2: SLiC-HF using our RM
## π Benchmarks
Here we present the results of T-lite-instruct-0.1 on automatic benchmarks.
### π [MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench)
This benchmark was carefully translated into Russian and measured with [LLM Judge](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge) codebase, using gpt-4-1106-preview as a judge.
| MT-Bench | Total | Turn_1 | Turn_2 | coding | humanities | math | reasoning | roleplay | stem | writing |
|-----------------------------------------------------------------|:-----------:|:------------:|:------------:|:------:|:----------:|:----:|:---------:|:--------:|:----:|:-------:|
| **T-lite-instruct-0.1** | **6.458** | **6.833** | 6.078 | 4.136 | **8.45** | 4.25 | **4.5** |**7.667** |**7.7**| 7.706 |
| gpt3.5-turbo-0125 | 6.373 | 6.423 | **6.320** |**6.519**| 7.474 | 4.75 | 4.15 | 6.333 | 6.7 | 7.588 |
| suzume-llama-3-8B-multilingual-orpo-borda-half | 6.051 | 6.577 | 5.526 | 4.318 | 8.0 | 4.0 | 3.6 | 7.056 | 6.7 | **7.889** |
| Qwen2-7b-Instruct | 6.026 | 6.449 | 5.603 | 5.0 | 6.95 |**5.8**| 4.15 | 7.167 | 5.85 | 7.278 |
| Llama-3-8b-Instruct | 5.948 | 6.662 | 5.224 | 4.727 | 7.8 | 3.9 | 2.8 | 7.333 | 6.053 | 7.0 |
| suzume-llama-3-8B-multilingual | 5.808 | 6.167 | 5.449 | 5.409 | 6.4 | 5.05 | 3.8 | 6.556 | 5.0 | 7.056 |
| saiga_llama3_8b | 5.471 | 5.896 | 5.039 | 3.0 | 7.4 | 3.55 | 3.5 | 6.444 | 5.15 | 7.812 |
| Mistral-7B-Instruct-v0.3 | 5.135 | 5.679 | 4.584 | 4.045 | 6.35 | 3.15 | 3.2 | 5.765 | 5.2 | 7.333 |
### ποΈ [Arena](https://github.com/lm-sys/arena-hard-auto)
We used Russian version of Arena benchmark from [Vikhrmodels](https://huggingface.co/datasets/Vikhrmodels/ru-arena-general) and [Arena Hard Auto](https://github.com/lm-sys/arena-hard-auto) codebase
for evaluation. As baseline model we chose gpt3.5-turbo-0125 and the judge was gpt-4-1106-preview.
| Arena General | Score | 95% CI | Average Tokens |
|-----------------------------------------------------------------|:-----------:|:------------:|:--------------:|
| **T-lite-instruct-0.1** | **57.26** | -2.9/2 | 870 |
| gpt3.5-turbo-0125 | 50 | 0/0 | 254 |
| suzume-llama-3-8B-multilingual-orpo-borda-half | 47.17 | -2.6/2.4 | 735 |
| Llama-3-8b-Instruct | 42.16 | -2.1/2.1 | 455 |
| saiga_llama3_8b | 39.88 | -2.3/2.5 | 616 |
| suzume-llama-3-8B-multilingual | 38.25 | -1.7/1.7 | 625 |
| Qwen2-7b-Instruct | 33.42 | -1.9/2.2 | 365 |
| Mistral-7B-Instruct-v0.3 | 28.11 | -2/2.2 | 570 |
## π¨βπ» Examples of usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
torch.manual_seed(42)
model_name = "t-bank-ai/T-lite-instruct-0.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
messages = [
{"role": "user", "content": "ΠΠ°ΠΏΠΈΡΠΈ ΡΠ΅ΡΠ΅ΠΏΡ ΠΊΠ»Π°ΡΡΠ½ΠΎΠΉ ΠΏΠΈΡΡΡ!"},
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = model.generate(
input_ids,
max_new_tokens=256,
eos_token_id=terminators,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
Output:
```
ΠΠΎΠ½Π΅ΡΠ½ΠΎ, Π²ΠΎΡ ΡΠ΅ΡΠ΅ΠΏΡ Π΄Π»Ρ Π²ΠΊΡΡΠ½ΠΎΠΉ Π΄ΠΎΠΌΠ°ΡΠ½Π΅ΠΉ ΠΏΠΈΡΡΡ, ΠΊΠΎΡΠΎΡΡΠΉ ΠΌΠΎΠΆΠ½ΠΎ Π°Π΄Π°ΠΏΡΠΈΡΠΎΠ²Π°ΡΡ ΠΏΠΎΠ΄ ΡΠ°Π·Π½ΡΠ΅ Π²ΠΊΡΡΡ ΠΈ ΠΏΡΠ΅Π΄ΠΏΠΎΡΡΠ΅Π½ΠΈΡ. ΠΠ°ΠΆΠ½ΠΎ, ΡΡΠΎΠ±Ρ ΡΠ΅ΡΡΠΎ Π±ΡΠ»ΠΎ ΠΌΡΠ³ΠΊΠΈΠΌ ΠΈ Π²ΠΎΠ·Π΄ΡΡΠ½ΡΠΌ, Π° Π½Π°ΡΠΈΠ½ΠΊΠ° β ΡΠΎΡΠ½ΠΎΠΉ ΠΈ Π°ΡΠΎΠΌΠ°ΡΠ½ΠΎΠΉ.
### ΠΠ½Π³ΡΠ΅Π΄ΠΈΠ΅Π½ΡΡ Π΄Π»Ρ ΡΠ΅ΡΡΠ°:
- 500 Π³ ΠΌΡΠΊΠΈ (Π»ΡΡΡΠ΅ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°ΡΡ ΡΠΌΠ΅ΡΡ ΠΏΡΠ΅Π½ΠΈΡΠ½ΠΎΠΉ ΠΈ ΡΠ΅Π»ΡΠ½ΠΎΠ·Π΅ΡΠ½ΠΎΠ²ΠΎΠΉ)
- 1 Ρ. Π». ΡΡΡ
ΠΈΡ
Π΄ΡΠΎΠΆΠΆΠ΅ΠΉ (ΠΈΠ»ΠΈ 7 Π³ ΡΠ²Π΅ΠΆΠΈΡ
)
- 1 Ρ. Π». ΡΠ°Ρ
Π°ΡΠ°
- 1 Ρ. Π». ΡΠΎΠ»ΠΈ
- 1 ΡΡ. Π». ΠΎΠ»ΠΈΠ²ΠΊΠΎΠ²ΠΎΠ³ΠΎ ΠΌΠ°ΡΠ»Π°
- 300 ΠΌΠ» ΡΡΠΏΠ»ΠΎΠΉ Π²ΠΎΠ΄Ρ
- 1 ΡΠΉΡΠΎ (Π΄Π»Ρ ΡΠΌΠ°Π·ΠΊΠΈ)
### ΠΠ½Π³ΡΠ΅Π΄ΠΈΠ΅Π½ΡΡ Π΄Π»Ρ Π½Π°ΡΠΈΠ½ΠΊΠΈ (ΠΏΡΠΈΠΌΠ΅ΡΠ½ΡΠΉ Π½Π°Π±ΠΎΡ):
- 200 Π³ ΡΠΎΠΌΠ°ΡΠ½ΠΎΠ³ΠΎ ΡΠΎΡΡΠ° (ΠΌΠΎΠΆΠ½ΠΎ ΡΠ΄Π΅Π»Π°ΡΡ ΡΠ°ΠΌΠΎΠΌΡ ΠΈΠ· ΡΠ²Π΅ΠΆΠΈΡ
ΠΏΠΎΠΌΠΈΠ΄ΠΎΡΠΎΠ² ΠΈΠ»ΠΈ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°ΡΡ Π³ΠΎΡΠΎΠ²ΡΠΉ)
- 200 Π³ ΠΌΠΎΡΠ°ΡΠ΅Π»Π»Ρ, Π½Π°ΡΠ΅Π·Π°Π½Π½ΠΎΠΉ Π»ΠΎΠΌΡΠΈΠΊΠ°ΠΌΠΈ
- 100 Π³ ΡΡΡΠ° ΠΏΠ°ΡΠΌΠ΅Π·Π°Π½ (ΡΠ΅ΡΡΡΠΉ)
- 100 Π³ Π²Π΅ΡΡΠΈΠ½Ρ ΠΈΠ»ΠΈ ΠΊΠΎΠ»Π±Π°ΡΡ
- 100 Π³ Π³ΡΠΈΠ±ΠΎΠ² (ΡΠ°ΠΌΠΏΠΈΠ½
```