eglkan1's picture
End of training
f5139c5 verified
metadata
license: apache-2.0
base_model: google/mt5-base
tags:
  - generated_from_trainer
metrics:
  - rouge
  - sacrebleu
model-index:
  - name: mT5-TextSimp-LT-BatchSize2-lr1e-4
    results: []

mT5-TextSimp-LT-BatchSize2-lr1e-4

This model is a fine-tuned version of google/mt5-base on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0672
  • Rouge1: 0.7548
  • Rouge2: 0.5989
  • Rougel: 0.7509
  • Sacrebleu: 49.0373
  • Gen Len: 38.0501

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 8

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Sacrebleu Gen Len
25.6783 0.24 200 16.0497 0.0109 0.0005 0.0107 0.0029 512.0
1.9593 0.48 400 0.7780 0.014 0.0005 0.0136 0.0146 42.685
0.2778 0.72 600 0.1429 0.4924 0.3128 0.4803 20.3057 38.0382
0.1325 0.96 800 0.1039 0.6193 0.4369 0.6098 33.687 38.0501
0.1702 1.2 1000 0.0958 0.6697 0.5016 0.6613 38.0391 38.0501
0.13 1.44 1200 0.0880 0.6737 0.5051 0.6644 38.62 38.0501
0.1086 1.67 1400 0.0839 0.6964 0.5326 0.6884 40.9056 38.0501
0.0716 1.91 1600 0.0859 0.6933 0.5298 0.6862 40.7158 38.0501
0.1135 2.15 1800 0.0820 0.7017 0.5366 0.6936 40.7484 38.0501
0.0997 2.39 2000 0.0814 0.7011 0.5351 0.6945 41.1948 38.0501
0.0996 2.63 2200 0.0774 0.7103 0.5522 0.7049 42.5756 38.0501
1.1379 2.87 2400 0.0763 0.7211 0.5556 0.7152 43.2411 38.0501
0.0594 3.11 2600 0.0776 0.7261 0.5647 0.7201 44.2205 38.0501
0.0763 3.35 2800 0.0736 0.7309 0.5709 0.7251 45.2825 38.0501
0.1641 3.59 3000 0.0722 0.7297 0.5685 0.7242 44.9001 38.0501
0.1085 3.83 3200 0.0703 0.7377 0.5793 0.7319 45.7504 38.0501
0.0573 4.07 3400 0.0719 0.7393 0.5796 0.7335 45.86 38.0501
0.1149 4.31 3600 0.0705 0.7415 0.5787 0.7365 46.2652 38.0501
0.0843 4.55 3800 0.0703 0.7385 0.5754 0.7326 46.5292 38.0501
0.0658 4.78 4000 0.0705 0.7437 0.5855 0.7384 46.864 38.0501
0.0676 5.02 4200 0.0694 0.7437 0.584 0.7384 47.1268 38.0501
0.0657 5.26 4400 0.0711 0.7473 0.5913 0.7432 47.4413 38.0501
0.0679 5.5 4600 0.0702 0.7496 0.5908 0.7446 47.8281 38.0501
0.0664 5.74 4800 0.0671 0.7511 0.5929 0.7463 47.7693 38.0501
0.0446 5.98 5000 0.0685 0.7533 0.5932 0.7478 48.032 38.0501
0.0732 6.22 5200 0.0678 0.7523 0.5948 0.7472 48.3467 38.0501
0.0706 6.46 5400 0.0672 0.755 0.5983 0.7507 48.6158 38.0501
0.051 6.7 5600 0.0674 0.7523 0.5961 0.7478 48.4828 38.0501
0.067 6.94 5800 0.0681 0.7532 0.5978 0.7492 48.7253 38.0501
0.075 7.18 6000 0.0684 0.7534 0.5969 0.7492 48.7053 38.0501
0.1323 7.42 6200 0.0671 0.755 0.5991 0.7511 48.9922 38.0501
0.0383 7.66 6400 0.0671 0.7551 0.5994 0.7511 49.0028 38.0501
0.0599 7.89 6600 0.0672 0.7548 0.5989 0.7509 49.0373 38.0501

Framework versions

  • Transformers 4.33.0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.4
  • Tokenizers 0.13.3