metadata
license: openrail++
datasets:
- textdetox/multilingual_toxicity_dataset
language:
- en
- ru
- uk
- es
- de
- am
- ar
- zh
- hi
metrics:
- f1
This is an instance of xlm-roberta-large that was fine-tuned on binary toxicity classification task based on our compiled dataset textdetox/multilingual_toxicity_dataset.
Firstly, we separated a balanced 20% test set to check the model adequency. Then, the model was fine-tuned on the full data. The results on the test set are the following:
Precision | Recall | F1 | |
---|---|---|---|
all_lang | 0.8713 | 0.8710 | 0.8710 |
en | 0.9650 | 0.9650 | 0.9650 |
ru | 0.9791 | 0.9790 | 0.9790 |
uk | 0.9267 | 0.9250 | 0.9251 |
de | 0.8791 | 0.8760 | 0.8758 |
es | 0.8700 | 0.8700 | 0.8700 |
ar | 0.7787 | 0.7780 | 0.7780 |
am | 0.7781 | 0.7780 | 0.7780 |
hi | 0.9360 | 0.9360 | 0.9360 |
zh | 0.7318 | 0.7320 | 0.7315 |