metadata

license: apache-2.0
tags:
  - generated_from_trainer
datasets:
  - cnn_dailymail
metrics:
  - rouge
model-index:
  - name: base
    results:
      - task:
          name: Summarization
          type: summarization
        dataset:
          name: cnn_dailymail 3.0.0
          type: cnn_dailymail
          config: 3.0.0
          split: validation
          args: 3.0.0
        metrics:
          - name: Rouge1
            type: rouge
            value: 42.1388

base

This model is a fine-tuned version of google/flan-t5-base on the cnn_dailymail 3.0.0 dataset. It achieves the following results on the evaluation set:

Loss: 1.4232
Rouge1: 42.1388
Rouge2: 19.7696
Rougel: 30.1512
Rougelsum: 39.3222
Gen Len: 71.8562

Model description

Model type: Language model
Language(s) (NLP): English, Spanish, Japanese, Persian, Hindi, French, Chinese, Bengali, Gujarati, German, Telugu, Italian, Arabic, Polish, Tamil, Marathi, Malayalam, Oriya, Panjabi, Portuguese, Urdu, Galician, Hebrew, Korean, Catalan, Thai, Dutch, Indonesian, Vietnamese, Bulgarian, Filipino, Central Khmer, Lao, Turkish, Russian, Croatian, Swedish, Yoruba, Kurdish, Burmese, Malay, Czech, Finnish, Somali, Tagalog, Swahili, Sinhala, Kannada, Zhuang, Igbo, Xhosa, Romanian, Haitian, Estonian, Slovak, Lithuanian, Greek, Nepali, Assamese, Norwegian
License: Apache 2.0
Related Models: All FLAN-T5 Checkpoints
Original Checkpoints: All Original FLAN-T5 Checkpoints
Resources for more information:

Intended uses & limitations

The information below in this section are copied from the model's official model card:

Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application,

Training and evaluation data

Loss: 1.4232
Rouge1: 42.1388
Rouge2: 19.7696
Rougel: 30.1512
Rougelsum: 39.3222
Gen Len: 71.8562

Training procedure

Training procedure example notebook for flan-T5 and pushing it to hub https://github.com/EveripediaNetwork/ai/blob/main/notebooks/Fine-Tuning-Flan-T5_1.ipynb

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 1
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 64
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: Constant
num_epochs: 3.0

Framework versions

Transformers 4.27.0.dev0
Pytorch 1.13.0+cu117
Datasets 2.7.1
Tokenizers 0.12.1