File size: 7,369 Bytes
a973561
c1cfea2
a973561
 
 
 
 
 
c6e6446
a973561
 
 
988bacc
 
 
 
 
 
 
 
8d398fa
988bacc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a973561
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2ba088b
a973561
 
 
 
 
 
 
 
 
 
2ba088b
a973561
2ba088b
a973561
 
0004439
2ba088b
0004439
 
a973561
 
 
 
 
 
 
 
2ba088b
a973561
 
 
 
 
 
 
 
 
 
 
 
 
2ba088b
 
 
 
848679a
2ba088b
 
 
 
 
 
a973561
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
---
language: it
tags:
- text-classification
- pytorch
- tensorflow
datasets:
- multi_nli
- glue
license: mit
pipeline_tag: zero-shot-classification
widget:
- text: "La seconda guerra mondiale vide contrapporsi, tra il 1939 e il 1945, le cosiddette\
    \ potenze dell'Asse e gli Alleati che, come gi\xE0 accaduto ai belligeranti della\
    \ prima guerra mondiale, si combatterono su gran parte del pianeta; il conflitto\
    \ ebbe inizio il 1\xBA settembre 1939 con l'attacco della Germania nazista alla\
    \ Polonia e termin\xF2, nel teatro europeo, l'8 maggio 1945 con la resa tedesca\
    \ e, in quello asiatico, il successivo 2 settembre con la resa dell'Impero giapponese\
    \ dopo i bombardamenti atomici di Hiroshima e Nagasaki."
  candidate_labels: guerra, storia, moda, cibo
  multi_class: true
model-index:
- name: Jiva/xlm-roberta-large-it-mnli
  results:
  - task:
      type: natural-language-inference
      name: Natural Language Inference
    dataset:
      name: glue
      type: glue
      config: mnli
      split: validation_matched
    metrics:
    - name: Accuracy
      type: accuracy
      value: 0.8819154355578197
      verified: true
    - name: Precision Macro
      type: precision
      value: 0.8814638070461666
      verified: true
    - name: Precision Micro
      type: precision
      value: 0.8819154355578197
      verified: true
    - name: Precision Weighted
      type: precision
      value: 0.881571663280083
      verified: true
    - name: Recall Macro
      type: recall
      value: 0.8802419956104793
      verified: true
    - name: Recall Micro
      type: recall
      value: 0.8819154355578197
      verified: true
    - name: Recall Weighted
      type: recall
      value: 0.8819154355578197
      verified: true
    - name: F1 Macro
      type: f1
      value: 0.8802937937959167
      verified: true
    - name: F1 Micro
      type: f1
      value: 0.8819154355578197
      verified: true
    - name: F1 Weighted
      type: f1
      value: 0.8811955957302677
      verified: true
    - name: loss
      type: loss
      value: 0.3171548545360565
      verified: true
---

# XLM-roBERTa-large-it-mnli

## Version 0.1
|                                                                                  | matched-it acc | mismatched-it acc |
| -------------------------------------------------------------------------------- |----------------|-------------------| 
| XLM-roBERTa-large-it-mnli     | 84.75          | 85.39             |

## Model Description
This model takes [xlm-roberta-large](https://huggingface.co/xlm-roberta-large) and fine-tunes it on a subset of NLI data taken from a automatically translated version of the MNLI corpus. It is intended to be used for zero-shot text classification, such as with the Hugging Face [ZeroShotClassificationPipeline](https://huggingface.co/transformers/master/main_classes/pipelines.html#transformers.ZeroShotClassificationPipeline).
## Intended Usage
This model is intended to be used for zero-shot text classification of italian texts.
Since the base model was pre-trained trained on 100 different languages, the
model has shown some effectiveness in languages beyond those listed above as
well. See the full list of pre-trained languages in appendix A of the
[XLM Roberata paper](https://arxiv.org/abs/1911.02116)
For English-only classification, it is recommended to use
[bart-large-mnli](https://huggingface.co/facebook/bart-large-mnli) or
[a distilled bart MNLI model](https://huggingface.co/models?filter=pipeline_tag%3Azero-shot-classification&search=valhalla).
#### With the zero-shot classification pipeline
The model can be loaded with the `zero-shot-classification` pipeline like so:
```python
from transformers import pipeline
classifier = pipeline("zero-shot-classification",
                      model="Jiva/xlm-roberta-large-it-mnli", device=0, use_fast=True, multi_label=True)              
```
You can then classify in any of the above languages. You can even pass the labels in one language and the sequence to
classify in another:
```python
# we will classify the following wikipedia entry about Sardinia"
sequence_to_classify = "La Sardegna è una regione italiana a statuto speciale di 1 592 730 abitanti con capoluogo Cagliari, la cui denominazione bilingue utilizzata nella comunicazione ufficiale è Regione Autonoma della Sardegna / Regione Autònoma de Sardigna."
# we can specify candidate labels in Italian:
candidate_labels = ["geografia", "politica", "macchine", "cibo", "moda"]
classifier(sequence_to_classify, candidate_labels)
# {'labels': ['geografia', 'moda', 'politica', 'macchine', 'cibo'],
# 'scores': [0.38871392607688904, 0.22633370757102966, 0.19398456811904907, 0.13735772669315338, 0.13708525896072388]}
```
The default hypothesis template is the English, `This text is {}`. With this model better results are achieving when providing a translated template:
```python
sequence_to_classify = "La Sardegna è una regione italiana a statuto speciale di 1 592 730 abitanti con capoluogo Cagliari, la cui denominazione bilingue utilizzata nella comunicazione ufficiale è Regione Autonoma della Sardegna / Regione Autònoma de Sardigna."
candidate_labels = ["geografia", "politica", "macchine", "cibo", "moda"]
hypothesis_template = "si parla di {}"
# classifier(sequence_to_classify, candidate_labels, hypothesis_template=hypothesis_template)
# 'scores': [0.6068345904350281, 0.34715887904167175, 0.32433947920799255, 0.3068877160549164, 0.18744681775569916]}
```
#### With manual PyTorch
```python
# pose sequence as a NLI premise and label as a hypothesis
from transformers import AutoModelForSequenceClassification, AutoTokenizer
nli_model = AutoModelForSequenceClassification.from_pretrained('Jiva/xlm-roberta-large-it-mnli')
tokenizer = AutoTokenizer.from_pretrained('Jiva/xlm-roberta-large-it-mnli')
premise = sequence
hypothesis = f'si parla di {}.'
# run through model pre-trained on MNLI
x = tokenizer.encode(premise, hypothesis, return_tensors='pt',
                     truncation_strategy='only_first')
logits = nli_model(x.to(device))[0]
# we throw away "neutral" (dim 1) and take the probability of
# "entailment" (2) as the probability of the label being true 
entail_contradiction_logits = logits[:,[0,2]]
probs = entail_contradiction_logits.softmax(dim=1)
prob_label_is_true = probs[:,1]
```
## Training

## Version 0.1
The model has been now retrained on the full training set. Around 1000 sentences pairs have been removed from the set because their translation was botched by the translation model.

| metric          	| value 	|
|-----------------	|-------	|
| learning_rate    	| 4e-6  	|
| optimizer       	| AdamW 	|
| batch_size      	| 80    	|
| mcc             	| 0.77  	|
| train_loss      	| 0.34  	|
| eval_loss       	| 0.40  	|
| stopped_at_step 	| 9754  	|

## Version 0.0
This model was pre-trained on set of 100 languages, as described in
[the original paper](https://arxiv.org/abs/1911.02116). It was then fine-tuned on the task of NLI on an Italian translation of the MNLI dataset (85% of the train set only so far). The model used for translating the texts is Helsinki-NLP/opus-mt-en-it, with a max output sequence lenght of 120. The model has been trained for 1 epoch with learning rate 4e-6 and batch size 80, currently it scores 82 acc. on the remaining 15% of the training.