t-bank-ai
/

response-toxicity-classifier-base

@@ -14,28 +14,42 @@ widget:
   example_title: "Dialog example 3"
 ---
-# dialog-inapropriate-messages-classifier
 [BERT classifier from Skoltech](https://huggingface.co/Skoltech/russian-inappropriate-messages), finetuned on contextual data with 4 labels.
 # Training
-*Skoltech/russian-inappropriate-messages* was finetuned on a multiclass data with four classes
 1) OK label -- the message is OK in context and does not intent to offend or somehow harm the reputation of a speaker.
 2) Toxic label -- the message might be seen as a offensive one in given context.
 3) Severe toxic label -- the message is offencive, full of anger and was written to provoke a fight or any other discomfort
 4) Risks label -- the message touches on sensitive topics and can harm the reputation of the speaker (i.e. religion, politics)
-The model was finetuned on DATASET_LINK.
 # Evaluation results
-Model achieves the following results:
-|                         | OK - F1-score | TOXIC - F1-score | SEVERE TOXIC - F1-score | RISKS - F1-score |
-|-------------------------|-------------------------|-------------------|----------------|------------------|
-| DATASET_TWITTER val.csv | 0.896         | 0.348            | 0.490                   | 0.591            |
-| DATASET_GENA val.csv    | 0.940         | 0.295            | 0.729                   | 0.46             |
 The work was done during internship at Tinkoff by [Nikita Stepanov](https://huggingface.co/nikitast).

   example_title: "Dialog example 3"
 ---
+# response-toxicity-classifier-base
 [BERT classifier from Skoltech](https://huggingface.co/Skoltech/russian-inappropriate-messages), finetuned on contextual data with 4 labels.
 # Training
+*Skoltech/russian-inappropriate-messages* was finetuned on a multiclass data with four classes (*check the exact mapping between idx and label in* `model.config`).
 1) OK label -- the message is OK in context and does not intent to offend or somehow harm the reputation of a speaker.
 2) Toxic label -- the message might be seen as a offensive one in given context.
 3) Severe toxic label -- the message is offencive, full of anger and was written to provoke a fight or any other discomfort
 4) Risks label -- the message touches on sensitive topics and can harm the reputation of the speaker (i.e. religion, politics)
+The model was finetuned on a soon-to-be-posted dataset of dialogs.
 # Evaluation results
+Model achieves the following results on the validation datasets (will be posted soon):
+| OK - F1-score | TOXIC - F1-score | SEVERE TOXIC - F1-score | RISKS - F1-score |
+|-------------------------|-------------------|----------------|------------------|
+| 0.896         | 0.348            | 0.490                   | 0.591            |
+ | 0.940         | 0.295            | 0.729                   | 0.46             |
+ # Use in transformers
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+tokenizer = AutoTokenizer.from_pretrained('tinkoff-ai/response-toxicity-classifier-base')
+model = AutoModelForSequenceClassification.from_pretrained('tinkoff-ai/response-toxicity-classifier-base')
+inputs = tokenizer('[CLS]привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?', max_length=128, add_special_tokens=False, return_tensors='pt')
+with torch.inference_mode():
+    logits = model(**inputs).logits
+    probas = torch.sigmoid(logits)[0].cpu().detach().numpy()
+```
 The work was done during internship at Tinkoff by [Nikita Stepanov](https://huggingface.co/nikitast).