Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,33 @@
|
|
1 |
---
|
2 |
license: openrail++
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: openrail++
|
3 |
+
datasets:
|
4 |
+
- textdetox/multilingual_toxicity_dataset
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
- ru
|
8 |
+
- uk
|
9 |
+
- es
|
10 |
+
- de
|
11 |
+
- am
|
12 |
+
- ar
|
13 |
+
- zh
|
14 |
+
- hi
|
15 |
+
metrics:
|
16 |
+
- f1
|
17 |
---
|
18 |
+
This is an instance of [xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) that was fine-tuned on binary toxicity classification task based on our compiled dataset [textdetox/multilingual_toxicity_dataset](https://huggingface.co/datasets/textdetox/multilingual_toxicity_dataset).
|
19 |
+
|
20 |
+
Firstly, we separated a balanced 20% test set to check the model adequency. Then, the model was fine-tuned on the full data. The results on the test set are the following:
|
21 |
+
|
22 |
+
| | Precision | Recall | F1 |
|
23 |
+
|----------|-----------|--------|-------|
|
24 |
+
| all_lang | 0.8713 | 0.8710 | 0.8710|
|
25 |
+
| en | 0.9650 | 0.9650 | 0.9650|
|
26 |
+
| ru | 0.9791 | 0.9790 | 0.9790|
|
27 |
+
| uk | 0.9267 | 0.9250 | 0.9251|
|
28 |
+
| de | 0.8791 | 0.8760 | 0.8758|
|
29 |
+
| es | 0.8700 | 0.8700 | 0.8700|
|
30 |
+
| ar | 0.7787 | 0.7780 | 0.7780|
|
31 |
+
| am | 0.7781 | 0.7780 | 0.7780|
|
32 |
+
| hi | 0.9360 | 0.9360 | 0.9360|
|
33 |
+
| zh | 0.7318 | 0.7320 | 0.7315|
|