|
--- |
|
license: cc-by-4.0 |
|
language: |
|
- it |
|
--- |
|
# GeNTE Evaluator |
|
|
|
The **Gender-Neutral Translation (GeNTE) Evaluator** is a sequence classification model used for evaluating inclusive rewriting and translations into Italian with the [GeNTE corpus](https://huggingface.co/datasets/FBK-MT/GeNTE). |
|
It is built by fine-tuning the RoBERTa-based [UmBERTo model](https://huggingface.co/Musixmatch/umberto-wikipedia-uncased-v1). |
|
|
|
More details on the training process and the reproducibility can be found in the [official repository](https://github.com/hlt-mt/fbk-NEUTR-evAL/blob/main/solutions/GeNTE.md) and the [paper](https://aclanthology.org/2024.eacl-short.23/). |
|
|
|
## Usage |
|
|
|
You can use the GeNTE Evaluator as follows: |
|
|
|
``` |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
|
# load the tokenizer of UmBERTo |
|
tokenizer = AutoTokenizer.from_pretrained("Musixmatch/umberto-wikipedia-uncased-v1", do_lower_case=False) |
|
|
|
# load GeNTE Evaluator |
|
model = AutoModelForSequenceClassification.from_pretrained("FBK-MT/GeNTE-evaluator") |
|
|
|
# neutral example |
|
sample = "Condividiamo il parere di chi ha presentato la relazione |
|
che ha posto notevole enfasi sull'informazione in relazione ai rischi e sulla trasparenza, |
|
in particolare nel campo sanitario e della sicurezza." |
|
input = tokenizer(sample, return_tensors='pt') |
|
|
|
with torch.no_grad(): |
|
probs = model(**input).logits |
|
|
|
predicted_label = torch.argmax(probs, dim=1).item() |
|
print(predicted_label) # 0 is neutral, 1 is gendered |
|
``` |
|
|
|
## Citation |
|
|
|
``` |
|
@inproceedings{savoldi-etal-2024-prompt, |
|
title = "A Prompt Response to the Demand for Automatic Gender-Neutral Translation", |
|
author = "Savoldi, Beatrice and |
|
Piergentili, Andrea and |
|
Fucci, Dennis and |
|
Negri, Matteo and |
|
Bentivogli, Luisa", |
|
editor = "Graham, Yvette and |
|
Purver, Matthew", |
|
booktitle = "Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)", |
|
month = mar, |
|
year = "2024", |
|
address = "St. Julian{'}s, Malta", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://aclanthology.org/2024.eacl-short.23", |
|
pages = "256--267", |
|
abstract = "Gender-neutral translation (GNT) that avoids biased and undue binary assumptions is a pivotal challenge for the creation of more inclusive translation technologies. Advancements for this task in Machine Translation (MT), however, are hindered by the lack of dedicated parallel data, which are necessary to adapt MT systems to satisfy neutral constraints. For such a scenario, large language models offer hitherto unforeseen possibilities, as they come with the distinct advantage of being versatile in various (sub)tasks when provided with explicit instructions. In this paper, we explore this potential to automate GNT by comparing MT with the popular GPT-4 model. Through extensive manual analyses, our study empirically reveals the inherent limitations of current MT systems in generating GNTs and provides valuable insights into the potential and challenges associated with prompting for neutrality.", |
|
} |
|
``` |
|
|
|
## Contributions |
|
|
|
Thanks to [@dfucci](https://huggingface.co/dfucci) for adding this model. |