Update README.md
Browse files
README.md
CHANGED
@@ -1,54 +1,97 @@
|
|
1 |
---
|
2 |
-
|
|
|
|
|
3 |
tags:
|
4 |
-
-
|
|
|
|
|
|
|
|
|
5 |
datasets:
|
6 |
-
-
|
|
|
|
|
|
|
|
|
|
|
7 |
metrics:
|
8 |
- rouge
|
|
|
9 |
model-index:
|
10 |
-
- name: it5-efficient-small-el32-
|
11 |
results:
|
12 |
-
- task:
|
13 |
-
|
14 |
-
|
15 |
dataset:
|
16 |
-
|
17 |
-
|
18 |
-
args: fst
|
19 |
metrics:
|
20 |
-
|
21 |
-
|
22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
---
|
24 |
|
25 |
-
|
26 |
-
should probably proofread and complete it, then remove this comment. -->
|
27 |
|
28 |
-
|
29 |
|
30 |
-
This
|
31 |
-
|
32 |
-
- Loss: 2.2160
|
33 |
-
- Rouge1: 56.585
|
34 |
-
- Rouge2: 36.9335
|
35 |
-
- Rougel: 53.7782
|
36 |
-
- Rougelsum: 53.7779
|
37 |
-
- Gen Len: 13.0891
|
38 |
|
39 |
-
|
40 |
|
41 |
-
|
42 |
|
43 |
-
##
|
44 |
|
45 |
-
|
46 |
|
47 |
-
|
|
|
48 |
|
49 |
-
|
|
|
|
|
|
|
50 |
|
51 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
52 |
|
53 |
### Training hyperparameters
|
54 |
|
@@ -61,43 +104,10 @@ The following hyperparameters were used during training:
|
|
61 |
- lr_scheduler_type: linear
|
62 |
- num_epochs: 10.0
|
63 |
|
64 |
-
### Training results
|
65 |
-
|
66 |
-
| Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
|
67 |
-
|:-------------:|:-----:|:------:|:---------------:|:-------:|:-------:|:-------:|:---------:|:-------:|
|
68 |
-
| 2.9377 | 0.35 | 5000 | 2.5157 | 54.6148 | 35.1518 | 51.8908 | 51.8957 | 12.8717 |
|
69 |
-
| 2.803 | 0.7 | 10000 | 2.4086 | 55.641 | 36.1214 | 52.8683 | 52.8572 | 12.7513 |
|
70 |
-
| 2.5483 | 1.05 | 15000 | 2.3420 | 55.6604 | 36.0085 | 52.9599 | 52.9433 | 12.7754 |
|
71 |
-
| 2.4978 | 1.39 | 20000 | 2.3145 | 56.204 | 36.5896 | 53.338 | 53.3351 | 12.8804 |
|
72 |
-
| 2.5383 | 1.74 | 25000 | 2.2697 | 56.1356 | 36.6963 | 53.3579 | 53.3664 | 12.795 |
|
73 |
-
| 2.3368 | 2.09 | 30000 | 2.2603 | 56.0271 | 36.4249 | 53.3113 | 53.3272 | 12.7478 |
|
74 |
-
| 2.371 | 2.44 | 35000 | 2.2328 | 56.5041 | 36.8718 | 53.8064 | 53.7995 | 12.8243 |
|
75 |
-
| 2.3567 | 2.79 | 40000 | 2.2079 | 56.5318 | 36.9437 | 53.8359 | 53.8254 | 12.6851 |
|
76 |
-
| 2.1753 | 3.14 | 45000 | 2.2168 | 56.3831 | 36.8896 | 53.6542 | 53.6708 | 12.67 |
|
77 |
-
| 2.2069 | 3.48 | 50000 | 2.2055 | 56.7171 | 37.1665 | 53.9299 | 53.9259 | 12.8014 |
|
78 |
-
| 2.2396 | 3.83 | 55000 | 2.1801 | 56.936 | 37.5465 | 54.1064 | 54.1125 | 12.7989 |
|
79 |
-
| 2.0657 | 4.18 | 60000 | 2.1915 | 56.6312 | 37.1618 | 53.8646 | 53.8791 | 12.6987 |
|
80 |
-
| 2.0806 | 4.53 | 65000 | 2.1809 | 56.6599 | 37.1282 | 53.8838 | 53.8781 | 12.715 |
|
81 |
-
| 2.0933 | 4.88 | 70000 | 2.1771 | 56.5891 | 36.9461 | 53.8058 | 53.8087 | 12.6593 |
|
82 |
-
| 1.9949 | 5.23 | 75000 | 2.1932 | 56.4956 | 36.9679 | 53.7634 | 53.7731 | 12.6723 |
|
83 |
-
| 1.9954 | 5.57 | 80000 | 2.1813 | 56.4827 | 36.8319 | 53.6397 | 53.6399 | 12.6599 |
|
84 |
-
| 1.9912 | 5.92 | 85000 | 2.1755 | 56.6723 | 37.0432 | 53.8339 | 53.8233 | 12.7534 |
|
85 |
-
| 1.9068 | 6.27 | 90000 | 2.1849 | 56.6574 | 37.0691 | 53.9029 | 53.892 | 12.7037 |
|
86 |
-
| 1.9173 | 6.62 | 95000 | 2.1787 | 56.5701 | 36.861 | 53.6855 | 53.6699 | 12.6467 |
|
87 |
-
| 1.9131 | 6.97 | 100000 | 2.1862 | 56.7175 | 37.0749 | 53.8761 | 53.8794 | 12.7072 |
|
88 |
-
| 1.8164 | 7.32 | 105000 | 2.1999 | 56.6104 | 37.0809 | 53.8098 | 53.8216 | 12.6364 |
|
89 |
-
| 1.8489 | 7.66 | 110000 | 2.1945 | 56.6645 | 37.1267 | 53.9009 | 53.9008 | 12.5741 |
|
90 |
-
| 1.82 | 8.01 | 115000 | 2.2075 | 56.6075 | 37.0359 | 53.8792 | 53.8833 | 12.6428 |
|
91 |
-
| 1.772 | 8.36 | 120000 | 2.2067 | 56.4716 | 36.8675 | 53.6826 | 53.6742 | 12.6591 |
|
92 |
-
| 1.7795 | 8.71 | 125000 | 2.2056 | 56.4112 | 36.9011 | 53.6554 | 53.6495 | 12.608 |
|
93 |
-
| 1.72 | 9.06 | 130000 | 2.2197 | 56.4735 | 36.9255 | 53.6592 | 53.6463 | 12.6758 |
|
94 |
-
| 1.7174 | 9.41 | 135000 | 2.2169 | 56.4209 | 36.8139 | 53.5778 | 53.5685 | 12.6568 |
|
95 |
-
| 1.7466 | 9.75 | 140000 | 2.2165 | 56.3715 | 36.767 | 53.555 | 53.5468 | 12.6416 |
|
96 |
-
|
97 |
|
98 |
### Framework versions
|
99 |
|
100 |
- Transformers 4.15.0
|
101 |
- Pytorch 1.10.0+cu102
|
102 |
- Datasets 1.17.0
|
103 |
-
- Tokenizers 0.10.3
|
|
|
1 |
---
|
2 |
+
language:
|
3 |
+
- it
|
4 |
+
license: apache-2.0
|
5 |
tags:
|
6 |
+
- italian
|
7 |
+
- sequence-to-sequence
|
8 |
+
- style-transfer
|
9 |
+
- efficient
|
10 |
+
- formality-style-transfer
|
11 |
datasets:
|
12 |
+
- yahoo/xformal_it
|
13 |
+
widget:
|
14 |
+
- text: "Questa performance è a dir poco spiacevole."
|
15 |
+
- text: "In attesa di un Suo cortese riscontro, Le auguriamo un piacevole proseguimento di giornata."
|
16 |
+
- text: "Questa visione mi procura una goduria indescrivibile."
|
17 |
+
- text: "qualora ciò possa interessarti, ti pregherei di contattarmi."
|
18 |
metrics:
|
19 |
- rouge
|
20 |
+
- bertscore
|
21 |
model-index:
|
22 |
+
- name: it5-efficient-small-el32-formal-to-informal
|
23 |
results:
|
24 |
+
- task:
|
25 |
+
type: formality-style-transfer
|
26 |
+
name: "Formal-to-informal Style Transfer"
|
27 |
dataset:
|
28 |
+
type: xformal_it
|
29 |
+
name: "XFORMAL (Italian Subset)"
|
|
|
30 |
metrics:
|
31 |
+
- type: rouge1
|
32 |
+
value: 0.459
|
33 |
+
name: "Avg. Test Rouge1"
|
34 |
+
- type: rouge2
|
35 |
+
value: 0.244
|
36 |
+
name: "Avg. Test Rouge2"
|
37 |
+
- type: rougeL
|
38 |
+
value: 0.435
|
39 |
+
name: "Avg. Test RougeL"
|
40 |
+
- type: bertscore
|
41 |
+
value: 0.739
|
42 |
+
name: "Avg. Test BERTScore"
|
43 |
+
args:
|
44 |
+
- model_type: "dbmdz/bert-base-italian-xxl-uncased"
|
45 |
+
- lang: "it"
|
46 |
+
- num_layers: 10
|
47 |
+
- rescale_with_baseline: True
|
48 |
+
- baseline_path: "bertscore_baseline_ita.tsv"
|
49 |
---
|
50 |
|
51 |
+
# IT5 Cased Small Efficient EL32 for Formal-to-informal Style Transfer 🤗
|
|
|
52 |
|
53 |
+
*Shout-out to [Stefan Schweter](https://github.com/stefan-it) for contributing the pre-trained efficient model!*
|
54 |
|
55 |
+
This repository contains the checkpoint for the [IT5 Cased Small Efficient EL32](https://huggingface.co/it5/it5-efficient-small-el32)
|
56 |
+
model fine-tuned on Formal-to-informal style transfer on the Italian subset of the XFORMAL dataset as part of the experiments of the paper [IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation](https://arxiv.org/abs/2203.03759) by [Gabriele Sarti](https://gsarti.com) and [Malvina Nissim](https://malvinanissim.github.io).
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
|
58 |
+
Efficient IT5 models differ from the standard ones by adopting a different vocabulary that enables cased text generation and an [optimized model architecture](https://arxiv.org/abs/2109.10686) to improve performances while reducing parameter count. The Small-EL32 replaces the original encoder from the T5 Small architecture with a 32-layer deep encoder, showing improved performances over the base model.
|
59 |
|
60 |
+
A comprehensive overview of other released materials is provided in the [gsarti/it5](https://github.com/gsarti/it5) repository. Refer to the paper for additional details concerning the reported scores and the evaluation approach.
|
61 |
|
62 |
+
## Using the model
|
63 |
|
64 |
+
Model checkpoints are available for usage in Tensorflow, Pytorch and JAX. They can be used directly with pipelines as:
|
65 |
|
66 |
+
```python
|
67 |
+
from transformers import pipelines
|
68 |
|
69 |
+
f2i = pipeline("text2text-generation", model='it5/it5-efficient-small-el32-formal-to-informal')
|
70 |
+
f2i("Vi ringrazio infinitamente per vostra disponibilità")
|
71 |
+
>>> [{"generated_text": "e grazie per la vostra disponibilità!"}]
|
72 |
+
```
|
73 |
|
74 |
+
or loaded using autoclasses:
|
75 |
+
|
76 |
+
```python
|
77 |
+
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
78 |
+
|
79 |
+
tokenizer = AutoTokenizer.from_pretrained("it5-efficient-small-el32-formal-to-informal")
|
80 |
+
model = AutoModelForSeq2SeqLM.from_pretrained("it5-efficient-small-el32-formal-to-informal")
|
81 |
+
```
|
82 |
+
|
83 |
+
If you use this model in your research, please cite our work as:
|
84 |
+
|
85 |
+
```bibtex
|
86 |
+
@article{sarti-nissim-2022-it5,
|
87 |
+
title={{IT5}: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation},
|
88 |
+
author={Sarti, Gabriele and Nissim, Malvina},
|
89 |
+
journal={ArXiv preprint 2203.03759},
|
90 |
+
url={https://arxiv.org/abs/2203.03759},
|
91 |
+
year={2022},
|
92 |
+
month={mar}
|
93 |
+
}
|
94 |
+
```
|
95 |
|
96 |
### Training hyperparameters
|
97 |
|
|
|
104 |
- lr_scheduler_type: linear
|
105 |
- num_epochs: 10.0
|
106 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
107 |
|
108 |
### Framework versions
|
109 |
|
110 |
- Transformers 4.15.0
|
111 |
- Pytorch 1.10.0+cu102
|
112 |
- Datasets 1.17.0
|
113 |
+
- Tokenizers 0.10.3
|