Edit model card

This model is a fine-tuned version of gsarti/it5-base on Thoroughly Cleaned Italian mC4 Corpus (~41B words, ~275GB).
This is an mt5-based Question Answering model for the Italian language.
Training is done on translated subset of SQuAD 2.0 dataset (of about 100k questions).
Thus, this model not only attempts to answer questions through reading comprehension, but also refrains when presented with a question that cannot be answered based on the paragraph provided.

You can test the model by entering question + context like the string shown below:

In quale anno si è verificato il terremoto nel Sichuan?
Il terremoto del Sichuan del 2008 o il terremoto del Gran Sichuan, misurato a 8.0 Ms e 7.9 Mw, e si è verificato alle 02:28:01 PM China Standard Time all' epicentro (06:28:01 UTC) il 12 maggio nella provincia del Sichuan, ha ucciso 69.197 persone e lasciato 18.222 dispersi.

The train achieves the following results:

  • EM: 78.69
  • F1: 84.69
  • rouge1: precision=0.862, recall=0.849, fmeasure=0.845
  • rouge2: precision=0.309, recall=0.300, fmeasure=0.298
  • rougeL: precision=0.862, recall=0.849, fmeasure=0.845
  • rougeLsum: precision=0.862, recall=0.849, fmeasure=0.845
Downloads last month
5
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.