Update README.md
Browse files
README.md
CHANGED
@@ -14,28 +14,42 @@ widget:
|
|
14 |
example_title: "Dialog example 3"
|
15 |
---
|
16 |
|
17 |
-
#
|
18 |
|
19 |
[BERT classifier from Skoltech](https://huggingface.co/Skoltech/russian-inappropriate-messages), finetuned on contextual data with 4 labels.
|
20 |
|
21 |
# Training
|
22 |
|
23 |
-
*Skoltech/russian-inappropriate-messages* was finetuned on a multiclass data with four classes
|
24 |
|
25 |
1) OK label -- the message is OK in context and does not intent to offend or somehow harm the reputation of a speaker.
|
26 |
2) Toxic label -- the message might be seen as a offensive one in given context.
|
27 |
3) Severe toxic label -- the message is offencive, full of anger and was written to provoke a fight or any other discomfort
|
28 |
4) Risks label -- the message touches on sensitive topics and can harm the reputation of the speaker (i.e. religion, politics)
|
29 |
|
30 |
-
The model was finetuned on
|
31 |
|
32 |
# Evaluation results
|
33 |
|
34 |
-
Model achieves the following results:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|
36 |
-
| | OK - F1-score | TOXIC - F1-score | SEVERE TOXIC - F1-score | RISKS - F1-score |
|
37 |
-
|-------------------------|-------------------------|-------------------|----------------|------------------|
|
38 |
-
| DATASET_TWITTER val.csv | 0.896 | 0.348 | 0.490 | 0.591 |
|
39 |
-
| DATASET_GENA val.csv | 0.940 | 0.295 | 0.729 | 0.46 |
|
40 |
|
41 |
The work was done during internship at Tinkoff by [Nikita Stepanov](https://huggingface.co/nikitast).
|
|
|
14 |
example_title: "Dialog example 3"
|
15 |
---
|
16 |
|
17 |
+
# response-toxicity-classifier-base
|
18 |
|
19 |
[BERT classifier from Skoltech](https://huggingface.co/Skoltech/russian-inappropriate-messages), finetuned on contextual data with 4 labels.
|
20 |
|
21 |
# Training
|
22 |
|
23 |
+
*Skoltech/russian-inappropriate-messages* was finetuned on a multiclass data with four classes (*check the exact mapping between idx and label in* `model.config`).
|
24 |
|
25 |
1) OK label -- the message is OK in context and does not intent to offend or somehow harm the reputation of a speaker.
|
26 |
2) Toxic label -- the message might be seen as a offensive one in given context.
|
27 |
3) Severe toxic label -- the message is offencive, full of anger and was written to provoke a fight or any other discomfort
|
28 |
4) Risks label -- the message touches on sensitive topics and can harm the reputation of the speaker (i.e. religion, politics)
|
29 |
|
30 |
+
The model was finetuned on a soon-to-be-posted dataset of dialogs.
|
31 |
|
32 |
# Evaluation results
|
33 |
|
34 |
+
Model achieves the following results on the validation datasets (will be posted soon):
|
35 |
+
|
36 |
+
| OK - F1-score | TOXIC - F1-score | SEVERE TOXIC - F1-score | RISKS - F1-score |
|
37 |
+
|-------------------------|-------------------|----------------|------------------|
|
38 |
+
| 0.896 | 0.348 | 0.490 | 0.591 |
|
39 |
+
| 0.940 | 0.295 | 0.729 | 0.46 |
|
40 |
+
|
41 |
+
# Use in transformers
|
42 |
+
|
43 |
+
```python
|
44 |
+
import torch
|
45 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
46 |
+
tokenizer = AutoTokenizer.from_pretrained('tinkoff-ai/response-toxicity-classifier-base')
|
47 |
+
model = AutoModelForSequenceClassification.from_pretrained('tinkoff-ai/response-toxicity-classifier-base')
|
48 |
+
inputs = tokenizer('[CLS]привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?', max_length=128, add_special_tokens=False, return_tensors='pt')
|
49 |
+
with torch.inference_mode():
|
50 |
+
logits = model(**inputs).logits
|
51 |
+
probas = torch.sigmoid(logits)[0].cpu().detach().numpy()
|
52 |
+
```
|
53 |
|
|
|
|
|
|
|
|
|
54 |
|
55 |
The work was done during internship at Tinkoff by [Nikita Stepanov](https://huggingface.co/nikitast).
|