|
--- |
|
language: |
|
- multilingual |
|
- en |
|
- ar |
|
- bg |
|
- de |
|
- el |
|
- es |
|
- fr |
|
- hi |
|
- ru |
|
- sw |
|
- th |
|
- tr |
|
- ur |
|
- vi |
|
- zh |
|
license: mit |
|
tags: |
|
- zero-shot-classification |
|
- text-classification |
|
- nli |
|
- pytorch |
|
metrics: |
|
- accuracy |
|
datasets: |
|
- multi_nli |
|
- xnli |
|
pipeline_tag: zero-shot-classification |
|
widget: |
|
- text: "Angela Merkel ist eine Politikerin in Deutschland und Vorsitzende der CDU" |
|
candidate_labels: "politics, economy, entertainment, environment" |
|
--- |
|
|
|
|
|
--- |
|
# Multilingual XLM-V-base-mnli-xnli |
|
## Model description |
|
This multilingual model can perform natural language inference (NLI) on 116 languages and is therefore also |
|
suitable for multilingual zero-shot classification. The underlying XLM-V-base model was created |
|
by Meta AI and pretrained on the [CC100 multilingual dataset](https://huggingface.co/datasets/cc100). |
|
It was then fine-tuned on the [XNLI dataset](https://huggingface.co/datasets/xnli), which contains hypothesis-premise pairs from 15 languages, |
|
as well as the English [MNLI dataset](https://huggingface.co/datasets/multi_nli). |
|
XLM-V-base was publish on 23.01.2023 in [this paper](https://arxiv.org/pdf/2301.10472.pdf). |
|
Its main innovation is a larger and better vocabulary: previous multilingual models had a vocabulary of 250 000 tokens, |
|
while XLM-V 'knows' 1 million tokens. The improved vocabulary allows for better representations of more languages. |
|
|
|
|
|
### How to use the model |
|
#### Simple zero-shot classification pipeline |
|
```python |
|
from transformers import pipeline |
|
classifier = pipeline("zero-shot-classification", model="MoritzLaurer/xlm-v-base-mnli-xnli") |
|
|
|
sequence_to_classify = "Angela Merkel ist eine Politikerin in Deutschland und Vorsitzende der CDU" |
|
candidate_labels = ["politics", "economy", "entertainment", "environment"] |
|
output = classifier(sequence_to_classify, candidate_labels, multi_label=False) |
|
print(output) |
|
``` |
|
#### NLI use-case |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
import torch |
|
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") |
|
|
|
model_name = "MoritzLaurer/xlm-v-base-mnli-xnli" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
|
premise = "Angela Merkel ist eine Politikerin in Deutschland und Vorsitzende der CDU" |
|
hypothesis = "Emmanuel Macron is the President of France" |
|
|
|
input = tokenizer(premise, hypothesis, truncation=True, return_tensors="pt") |
|
output = model(input["input_ids"].to(device)) # device = "cuda:0" or "cpu" |
|
prediction = torch.softmax(output["logits"][0], -1).tolist() |
|
label_names = ["entailment", "neutral", "contradiction"] |
|
prediction = {name: round(float(pred) * 100, 1) for pred, name in zip(prediction, label_names)} |
|
print(prediction) |
|
``` |
|
|
|
### Training data |
|
This model was trained on the XNLI development dataset and the MNLI train dataset. |
|
The XNLI development set consists of 2490 professionally translated texts from English |
|
to 14 other languages (37350 texts in total) (see [this paper](https://arxiv.org/pdf/1809.05053.pdf)). |
|
Note that the XNLI contains a training set of 15 machine translated versions of the MNLI dataset for 15 languages, |
|
but due to quality issues with these machine translations, this model was only trained on the professional translations |
|
from the XNLI development set and the original English MNLI training set (392 702 texts). |
|
Not using machine translated texts can avoid overfitting the model to the 15 languages; |
|
avoids catastrophic forgetting of the other 101~ languages XLM-V was pre-trained on; |
|
and significantly reduces training costs. |
|
|
|
### Training procedure |
|
xlm-v-base-mnli-xnli was trained using the Hugging Face trainer with the following hyperparameters. |
|
``` |
|
training_args = TrainingArguments( |
|
num_train_epochs=3, # total number of training epochs |
|
learning_rate=2e-05, |
|
per_device_train_batch_size=32, # batch size per device during training |
|
per_device_eval_batch_size=120, # batch size for evaluation |
|
warmup_ratio=0.06, # number of warmup steps for learning rate scheduler |
|
weight_decay=0.01, # strength of weight decay |
|
) |
|
``` |
|
|
|
### Eval results |
|
The model was evaluated on the XNLI test set on 15 languages (5010 texts per language, 75150 in total). |
|
Note that multilingual NLI models are capable of classifying NLI texts without receiving NLI training data |
|
in the specific language (cross-lingual transfer). This means that the model is also able of doing NLI on |
|
the other 101~ languages XLM-V was training on, but performance is most likely lower than for those languages available in XNLI. |
|
|
|
Also note that if other multilingual models on the model hub claim performance of around 90% on languages other than English, |
|
the authors have most likely made a mistake during testing since non of the latest papers (of mostly larger models) shows a multilingual average performance |
|
of more than a few points above 80% on XNLI (see [here](https://arxiv.org/pdf/2111.09543.pdf) or [here](https://arxiv.org/pdf/1911.02116.pdf)). |
|
|
|
The average XNLI performance of XLM-V reported in the paper is 0.76 ([see table 2](https://arxiv.org/pdf/2301.10472.pdf)). |
|
This reimplementation has an average performance of 0.78. |
|
This increase in performance is probably thanks to the addition of MNLI in the training data. |
|
Note that [mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) has an average |
|
performance of 0.808 and is smaller (3GB for XLM-V vs. 560MB for mDeBERTa) and is faster (thanks to mDeBERTa's smaller vocabulary). |
|
This difference comes probably from mDeBERTa-v3's improved pre-training objective. |
|
Depending on the task, it is probably better to use [mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli), |
|
but XLM-V could be better on some languages based on its improved vocabulary. |
|
|
|
|Datasets|average|ar|bg|de|el|en|es|fr|hi|ru|sw|th|tr|ur|vi|zh| |
|
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |:---: | |
|
|Accuracy|0.780|0.757|0.808|0.796|0.79|0.856|0.814|0.806|0.751|0.782|0.725|0.757|0.766|0.729|0.784|0.782| |
|
|Speed GPU A100 (text/sec)|na|3501.0|3324.0|3438.0|3174.0|3713.0|3500.0|3129.0|3042.0|3419.0|3468.0|3782.0|3772.0|3099.0|3117.0|4217.0| |
|
|
|
|Datasets|mnli_m (en)|mnli_mm (en)| |
|
| :---: | :---: | :---: | |
|
|Accuracy|0.852|0.854| |
|
|Speed GPU A100 (text/sec)|2098.0|2170.0| |
|
|
|
|
|
## Limitations and bias |
|
Please consult the original XLM-V paper and literature on different NLI datasets for potential biases. |
|
|
|
## Citation |
|
If you use this model, please cite: Laurer, Moritz, Wouter van Atteveldt, Andreu Salleras Casas, and Kasper Welbers. 2022. |
|
‘Less Annotating, More Classifying – Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT - NLI’. |
|
Preprint, June. Open Science Framework. https://osf.io/74b8k. |
|
|
|
## Ideas for cooperation or questions? |
|
If you have questions or ideas for cooperation, contact me at m{dot}laurer{at}vu{dot}nl or [LinkedIn](https://www.linkedin.com/in/moritz-laurer/) |
|
|
|
|
|
|