--- language: ["ru"] tags: - russian - pretraining - conversational license: mit widget: - text: "[CLS] привет [SEP] привет! [SEP] как дела? [RESPONSE_TOKEN] норм" example_title: "Dialog example 1" - text: "[CLS] привет [SEP] привет! [SEP] как дела? [RESPONSE_TOKEN] ты *****" example_title: "Dialog example 2" --- # response-toxicity-classifier-base [BERT classifier from Skoltech](https://huggingface.co/Skoltech/russian-inappropriate-messages), finetuned on contextual data with 4 labels. # Training [*Skoltech/russian-inappropriate-messages*](https://huggingface.co/Skoltech/russian-inappropriate-messages) was finetuned on a multiclass data with four classes (*check the exact mapping between idx and label in* `model.config`). 1) OK label — the message is OK in context and does not intent to offend or somehow harm the reputation of a speaker. 2) Toxic label — the message might be seen as a offensive one in given context. 3) Severe toxic label — the message is offencive, full of anger and was written to provoke a fight or any other discomfort 4) Risks label — the message touches on sensitive topics and can harm the reputation of the speaker (i.e. religion, politics) The model was finetuned on a soon-to-be-posted dialogs datasets. # Evaluation results Model achieves the following results on the validation datasets (will be posted soon): || OK - F1-score | TOXIC - F1-score | SEVERE TOXIC - F1-score | RISKS - F1-score | |---------|---------------|------------------|-------------------------|------------------| |internet dialogs | 0.896 | 0.348 | 0.490 | 0.591 | |chatbot dialogs | 0.940 | 0.295 | 0.729 | 0.46 | # Use in transformers ```python import torch from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained('tinkoff-ai/response-toxicity-classifier-base') model = AutoModelForSequenceClassification.from_pretrained('tinkoff-ai/response-toxicity-classifier-base') inputs = tokenizer('[CLS]привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?', max_length=128, add_special_tokens=False, return_tensors='pt') with torch.inference_mode(): logits = model(**inputs).logits probas = torch.softmax(logits)[0].cpu().detach().numpy() ``` The work was done during internship at Tinkoff by [Nikita Stepanov](https://huggingface.co/nikitast).