Text Classification
Transformers
Safetensors
xlm-roberta
toxicity
Inference Endpoints
File size: 1,127 Bytes
56e1685
 
e0a9096
 
 
 
 
 
 
 
 
 
 
 
 
 
56e1685
e0a9096
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
---
license: openrail++
datasets:
- textdetox/multilingual_toxicity_dataset
language:
- en
- ru
- uk
- es
- de
- am
- ar
- zh
- hi
metrics:
- f1
---
This is an instance of [xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) that was fine-tuned on binary toxicity classification task based on our compiled dataset [textdetox/multilingual_toxicity_dataset](https://huggingface.co/datasets/textdetox/multilingual_toxicity_dataset).

Firstly, we separated a balanced 20% test set to check the model adequency. Then, the model was fine-tuned on the full data. The results on the test set are the following:

|          | Precision | Recall | F1    |
|----------|-----------|--------|-------|
| all_lang | 0.8713    | 0.8710 | 0.8710|
| en       | 0.9650    | 0.9650 | 0.9650|
| ru       | 0.9791    | 0.9790 | 0.9790|
| uk       | 0.9267    | 0.9250 | 0.9251|
| de       | 0.8791    | 0.8760 | 0.8758|
| es       | 0.8700    | 0.8700 | 0.8700|
| ar       | 0.7787    | 0.7780 | 0.7780|
| am       | 0.7781    | 0.7780 | 0.7780|
| hi       | 0.9360    | 0.9360 | 0.9360|
| zh       | 0.7318    | 0.7320 | 0.7315|