README.md · textdetox/xlmr-large-toxicity-classifier at e0a9096b4db1e8edb782f3968f1337a0aa0ca6a2

metadata

license: openrail++
datasets:
  - textdetox/multilingual_toxicity_dataset
language:
  - en
  - ru
  - uk
  - es
  - de
  - am
  - ar
  - zh
  - hi
metrics:
  - f1

This is an instance of xlm-roberta-large that was fine-tuned on binary toxicity classification task based on our compiled dataset textdetox/multilingual_toxicity_dataset.

Firstly, we separated a balanced 20% test set to check the model adequency. Then, the model was fine-tuned on the full data. The results on the test set are the following:

	Precision	Recall	F1
all_lang	0.8713	0.8710	0.8710
en	0.9650	0.9650	0.9650
ru	0.9791	0.9790	0.9790
uk	0.9267	0.9250	0.9251
de	0.8791	0.8760	0.8758
es	0.8700	0.8700	0.8700
ar	0.7787	0.7780	0.7780
am	0.7781	0.7780	0.7780
hi	0.9360	0.9360	0.9360
zh	0.7318	0.7320	0.7315