File size: 3,212 Bytes
66f586f
b5ffbcd
00a79c2
b5ffbcd
 
 
 
 
 
 
427ec4b
b5ffbcd
 
 
 
 
 
 
66f586f
b5ffbcd
 
 
 
 
 
 
 
 
978bb3f
dee159c
978bb3f
427ec4b
b5ffbcd
 
 
806458d
b5ffbcd
 
 
 
 
6b37628
 
 
 
 
 
b5ffbcd
 
 
 
806458d
 
 
 
 
 
 
 
 
b5ffbcd
 
 
 
 
 
 
 
 
 
e86867f
 
 
fa60d55
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
language: it
license: gpl-3.0
tags:
- text classification
- abusive language
- hate speech
- offensive language

widget:
- text: "Ci sono dei bellissimi capibara!"
  example_title: "Hate Speech Classification 1"
- text: "Sei una testa di cazzo!!"
  example_title: "Hate Speech Classification 2"
- text: "Ti odio!"
  example_title: "Hate Speech Classification 3"


---



#
[Debora Nozza](http://dnozza.github.io/) •
[Federico Bianchi](https://federicobianchi.io/) •
[Giuseppe Attanasio](https://gattanasio.cc/)


# HATE-ITA Base 
HATE-ITA is a binary hate speech classification model for Italian social media text.

<img src="https://raw.githubusercontent.com/MilaNLProc/hate-ita/main/hateita.png?token=GHSAT0AAAAAABTEBAJ4PNDWAMU3KKIGUOCSYWG4IBA" width="200">

## Abstract

Online hate speech is a dangerous phenomenon that can (and should) be promptly counteracted properly. While Natural Language Processing has been successfully used for the purpose, many of the research efforts are directed toward the English language. This choice severely limits the classification power in non-English languages. In this paper, we test several learning frameworks for identifying hate speech in Italian text. We release **HATE-ITA, a set of multi-language models trained on a large set of English data and available Italian datasets**. HATE-ITA performs better than mono-lingual models and seems to adapt well also on language-specific slurs. We believe our findings will encourage research in other mid-to-low resource communities and provide a valuable benchmarking tool for the Italian community.

## Model

This model is the fine-tuned version of the [XLM-T](https://arxiv.org/abs/2104.12250) model. 

| Model                       | Download |
| ------                      | -------------------------|
| `hate-ita` | [Link](https://huggingface.co/MilaNLProc/hate-ita) |
| `hate-ita-xlm-r-base`   | [Link](https://huggingface.co/MilaNLProc/hate-ita-xlm-r-base) |
| `hate-ita-xlm-r-large`   | [Link](https://huggingface.co/MilaNLProc/hate-ita-xlm-r-large) |

## Results

This model had an F1 of 0.83 on the test set.

## Usage

```python
from transformers import pipeline
classifier = pipeline("text-classification",model='MilaNLProc/hate-ita',top_k=2)
prediction = classifier("ti odio")
print(prediction)
```

## Citation
Please use the following BibTeX entry if you use this model in your project:
```
@inproceedings{nozza-etal-2022-hate-ita,
    title = {{HATE-ITA}: Hate Speech Detection in Italian Social Media Text},
    author = "Nozza, Debora and Bianchi, Federico and Attanasio, Giuseppe",
    booktitle = "Proceedings of the 6th Workshop on Online Abuse and Harms",
    year = "2022",
    publisher = "Association for Computational Linguistics"
}
```

## Ethical Statement
While promising, the results in this work should not be interpreted as a definitive assessment of the performance of hate speech detection in Italian. We are unsure if our model can maintain a stable and fair precision across the different targets and categories. HATE-ITA might overlook some sensible details, which practitioners should treat with care. 

## License 
[GNU GPLv3](https://choosealicense.com/licenses/gpl-3.0/)