t-bank-ai
/

response-toxicity-classifier-base

Text Classification

Inference Endpoints

Model card Files Files and versions Community

response-toxicity-classifier-base / README.md

d.tsimerman

add tags

7c44e25 over 2 years ago

|

2.03 kB

	---
	language: ["ru"]
	tags:
	- russian
	- pretraining
	license: mit
	---

	# dialog-inapropriate-messages-classifier

	[BERT classifier from Skoltech](https://huggingface.co/Skoltech/russian-inappropriate-messages), finetuned on contextual data with 4 labels.

	# Training

	Skoltech/russian-inappropriate-messages was finetuned on a multiclass data with four classes

	1) OK label -- the message is OK in context and does not intent to offend or somehow harm the reputation of a speaker.
	2) Toxic label -- the message might be seen as a offensive one in given context.
	3) Severe toxic label -- the message is offencive, full of anger and was written to provoke a fight or any other discomfort
	4) Risks label -- the message touches on sensitive topics and can harm the reputation of the speaker (i.e. religion, politics)

	The model was finetuned on DATASET_LINK.

	# Evaluation results

	Model achieves the following results:

	\| \| OK - Precision \| OK - Recall \| OK - F1-score \| TOXIC - Precision \| TOXIC - Recall \| TOXIC - F1-score \| SEVERE TOXIC - Precision \| SEVERE TOXIC - Recall \| SEVERE TOXIC - F1-score \| RISKS - Precision \| RISKS - Recall \| RISKS - F1-score \|
	\|-------------------------\|----------------\|-------------\|---------------\|-------------------\|----------------\|------------------\|--------------------------\|-----------------------\|-------------------------\|-------------------\|----------------\|------------------\|
	\| DATASET_TWITTER val.csv \| 0.883 \| 0.913 \| 0.896 \| 0.368 \| 0.330 \| 0.348 \| 0.515 \| 0.468 \| 0.490 \| 0.659 \| 0.535 \| 0.591 \|
	\| DATASET_GENA val.csv \| 0.953 \| 0.927 \| 0.940 \| 0.260 \| 0.343 \| 0.295 \| 0.666 \| 0.806 \| 0.729 \| 0.523 \| 0.423 \| 0.46 \|

	The work was done during internship at Tinkoff.