kmok1's picture
End of training
933c762 verified
metadata
license: apache-2.0
base_model: google/mt5-large
tags:
  - generated_from_trainer
metrics:
  - bleu
model-index:
  - name: cs_mT5-large_0.01_100_v0.1
    results: []

cs_mT5-large_0.01_100_v0.1

This model is a fine-tuned version of google/mt5-large on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 6.2112
  • Bleu: 0.8171
  • Gen Len: 19.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.01
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
9.5426 1.0 6 13.2737 0.0 19.0
9.5598 2.0 12 57.9184 0.2088 19.0
8.9859 3.0 18 7.1357 0.2088 19.0
4.8419 4.0 24 6.5896 0.0 2.0
6.0099 5.0 30 5.9797 0.0 19.0
5.4071 6.0 36 6.0228 0.0 19.0
5.4716 7.0 42 6.0132 0.0 19.0
5.6419 8.0 48 5.9242 0.0 19.0
5.7044 9.0 54 6.0529 0.0 19.0
5.7007 10.0 60 5.8730 0.0 19.0
5.0052 11.0 66 6.0392 0.0 19.0
6.3889 12.0 72 6.0776 0.0 19.0
5.2703 13.0 78 70.6639 0.0 19.0
7.1444 14.0 84 7.6067 0.0 19.0
4.7785 15.0 90 6.5610 0.0 19.0
5.6738 16.0 96 6.0522 0.0 19.0
5.5087 17.0 102 6.0558 0.0 19.0
5.4367 18.0 108 5.9737 0.0 19.0
5.5081 19.0 114 6.0431 0.0 19.0
5.2506 20.0 120 5.9623 0.0 19.0
5.354 21.0 126 6.0081 0.0 19.0
5.5891 22.0 132 5.9859 0.0 19.0
5.2457 23.0 138 5.9296 0.0 19.0
4.9566 24.0 144 6.0038 0.0 19.0
5.3327 25.0 150 6.0421 0.0 19.0
4.946 26.0 156 6.0225 0.0 19.0
5.1903 27.0 162 5.9587 0.0 19.0
5.0797 28.0 168 5.9780 0.0 19.0
4.8033 29.0 174 6.0577 0.0 19.0
5.559 30.0 180 6.0250 0.0 19.0
5.7859 31.0 186 5.9493 0.0 19.0
5.4172 32.0 192 6.0647 0.0 19.0
4.9906 33.0 198 6.0617 0.0 19.0
4.9745 34.0 204 5.9800 0.0 19.0
5.2086 35.0 210 5.9942 0.0 19.0
5.7047 36.0 216 5.9996 0.0 19.0
4.4275 37.0 222 6.0826 0.0 19.0
4.9545 38.0 228 6.0865 0.0 19.0
5.1466 39.0 234 5.9571 0.0 19.0
5.5095 40.0 240 5.9970 0.0 19.0
5.1998 41.0 246 5.9978 0.0 19.0
4.8406 42.0 252 6.0314 0.0 19.0
5.0467 43.0 258 6.0444 0.0 19.0
5.2282 44.0 264 6.0295 0.0 19.0
4.8847 45.0 270 6.0284 0.0 19.0
5.5734 46.0 276 6.0598 0.0 19.0
4.743 47.0 282 6.0396 0.0 19.0
5.3795 48.0 288 6.0567 0.0 19.0
4.9066 49.0 294 6.0615 0.0 19.0
4.9682 50.0 300 6.1018 0.0 19.0
4.828 51.0 306 6.0605 0.0 19.0
4.5153 52.0 312 6.0531 0.0 19.0
5.2316 53.0 318 5.9855 0.0 19.0
4.8071 54.0 324 6.0292 0.0 19.0
5.106 55.0 330 6.0541 0.0 19.0
4.9581 56.0 336 5.9499 0.0 19.0
4.8037 57.0 342 6.1083 0.0 19.0
4.7738 58.0 348 6.0111 0.0 19.0
5.3786 59.0 354 6.0164 0.0 19.0
4.8782 60.0 360 5.9442 0.0 19.0
4.8589 61.0 366 5.9036 0.8171 19.0
4.8486 62.0 372 5.7896 0.8171 19.0
4.4303 63.0 378 5.8475 0.8171 19.0
5.116 64.0 384 5.7361 0.8171 19.0
4.9206 65.0 390 5.7211 0.8171 19.0
4.5294 66.0 396 5.6845 0.8171 19.0
5.0969 67.0 402 5.6964 0.8171 19.0
4.4403 68.0 408 5.7035 0.8171 19.0
4.3498 69.0 414 5.7088 0.8171 19.0
5.0456 70.0 420 5.6742 0.8171 19.0
4.9812 71.0 426 5.6820 0.8171 19.0
4.4053 72.0 432 5.7010 0.8171 19.0
4.8459 73.0 438 5.8511 0.8171 19.0
4.3272 74.0 444 5.7204 0.8171 19.0
4.4791 75.0 450 5.7542 0.8171 19.0
4.5272 76.0 456 5.7444 0.8171 19.0
4.2581 77.0 462 5.7456 0.879 19.0
4.718 78.0 468 5.7187 0.8171 19.0
4.3661 79.0 474 5.8472 0.8291 19.0
4.8016 80.0 480 5.7478 0.8171 19.0
4.1973 81.0 486 5.8850 0.8171 19.0
4.0916 82.0 492 5.7678 0.8171 19.0
4.1624 83.0 498 5.8662 0.8171 19.0
4.2458 84.0 504 5.9224 0.8171 19.0
3.7141 85.0 510 5.8928 0.8171 19.0
3.5796 86.0 516 6.0489 0.937 19.0
4.8417 87.0 522 6.1602 0.8171 19.0
4.3568 88.0 528 5.9343 0.8171 19.0
4.6028 89.0 534 5.9039 0.8171 19.0
3.6638 90.0 540 6.1188 0.879 19.0
4.1465 91.0 546 6.0166 0.8171 19.0
4.32 92.0 552 6.0690 0.8171 19.0
4.0945 93.0 558 6.0812 0.8171 19.0
3.9572 94.0 564 5.9877 0.8171 19.0
3.9032 95.0 570 6.0960 0.2223 19.0
4.3571 96.0 576 6.1585 0.8171 19.0
3.768 97.0 582 6.1953 0.8171 19.0
3.94 98.0 588 6.2025 0.8171 19.0
3.8452 99.0 594 6.2129 0.8171 19.0
4.4174 100.0 600 6.2112 0.8171 19.0

Framework versions

  • Transformers 4.35.2
  • Pytorch 1.13.1+cu117
  • Datasets 2.17.0
  • Tokenizers 0.15.2