model-index:
- name: medieval-it5-base
results: []
language:
- it
medieval-it5-base
This model is a version of gsarti/it5-base fine-tuned on a dataset called ita2medieval. The Dataset contains sentences from medieval italian along with paraphrases in contemporary italian (approximately 6.5k pairs in total).
The fine-tuning task is text-style-tansfer from contemporary to medieval italian.
Using the model
from transformers import AutoTokenzier, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("leobertolazzi/medieval-it5-base")
model = AutoModelForSeq2SeqLM.from_pretrained("leobertolazzi/medieval-it5-base")
Flax and Tensorflow versions of the model are also available:
from transformers import FlaxT5ForConditionalGeneration, TFT5ForConditionalGeneration
model_flax = FlaxT5ForConditionalGeneration.from_pretrained("leobertolazzi/medieval-it5-base")
model_tf = TFT5ForConditionalGeneration.from_pretrained("leobertolazzi/medieval-it5-base")
Training procedure
The code used for the fine-tuning is available in this repo
Intended uses & limitations
The biggest limitation for this project is the size of the ita2medieval dataset. In fact, it consists only of 6.5K sentence pairs whereas gsarti/it5-base has 220M parameters.
For this reason the results can be far from perfect, but some nice style translations can also be obtained.
It would be nice to expand ita2medieval with text and paraphrases from more medieval italian authors!
Framework versions
- Transformers 4.26.0
- Tokenizers 0.13.2