C0uchP0tat0's picture
End of training
d2c772a
|
raw
history blame
5.25 kB
metadata
base_model: ai-forever/rugpt3medium_based_on_gpt2
tags:
  - generated_from_trainer
model-index:
  - name: my_rugpt3medium_finetune
    results: []

my_rugpt3medium_finetune

This model is a fine-tuned version of ai-forever/rugpt3medium_based_on_gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9955

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 3
  • total_train_batch_size: 24
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 35
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
3.5373 0.46 25 3.4828
3.5265 0.93 50 3.4708
3.478 1.39 75 3.4398
3.4851 1.85 100 3.3995
3.4407 2.31 125 3.3609
3.3731 2.78 150 3.3241
3.3584 3.24 175 3.2886
3.3267 3.7 200 3.2540
3.3043 4.17 225 3.2200
3.229 4.63 250 3.1853
3.2618 5.09 275 3.1508
3.1823 5.56 300 3.1164
3.172 6.02 325 3.0779
3.1354 6.48 350 3.0395
3.0899 6.94 375 2.9987
3.0741 7.41 400 2.9577
3.009 7.87 425 2.9140
2.9598 8.33 450 2.8737
2.9187 8.8 475 2.8294
2.9378 9.26 500 2.7842
2.8396 9.72 525 2.7374
2.8608 10.19 550 2.6889
2.7296 10.65 575 2.6405
2.7452 11.11 600 2.5926
2.6882 11.57 625 2.5389
2.6463 12.04 650 2.4893
2.572 12.5 675 2.4356
2.5384 12.96 700 2.3788
2.5246 13.43 725 2.3296
2.4055 13.89 750 2.2747
2.3759 14.35 775 2.2155
2.3351 14.81 800 2.1606
2.286 15.28 825 2.1061
2.2694 15.74 850 2.0504
2.1745 16.2 875 1.9967
2.1053 16.67 900 1.9411
2.1184 17.13 925 1.8878
2.0107 17.59 950 1.8362
2.027 18.06 975 1.7854
1.9153 18.52 1000 1.7304
1.9267 18.98 1025 1.6854
1.8131 19.44 1050 1.6331
1.8405 19.91 1075 1.5839
1.7294 20.37 1100 1.5370
1.7154 20.83 1125 1.4971
1.6573 21.3 1150 1.4476
1.6391 21.76 1175 1.4130
1.5497 22.22 1200 1.3727
1.5194 22.69 1225 1.3378
1.535 23.15 1250 1.3000
1.4514 23.61 1275 1.2714
1.4711 24.07 1300 1.2388
1.4105 24.54 1325 1.2136
1.4202 25.0 1350 1.1890
1.3351 25.46 1375 1.1679
1.3575 25.93 1400 1.1440
1.2882 26.39 1425 1.1202
1.3378 26.85 1450 1.1074
1.3094 27.31 1475 1.0864
1.2793 27.78 1500 1.0743
1.2377 28.24 1525 1.0626
1.2693 28.7 1550 1.0468
1.2157 29.17 1575 1.0368
1.2007 29.63 1600 1.0263
1.2376 30.09 1625 1.0221
1.2216 30.56 1650 1.0136
1.1923 31.02 1675 1.0102
1.2143 31.48 1700 1.0039
1.1764 31.94 1725 1.0014
1.1654 32.41 1750 0.9990
1.2031 32.87 1775 0.9976
1.1952 33.33 1800 0.9965
1.1852 33.8 1825 0.9961
1.1737 34.26 1850 0.9959
1.1609 34.72 1875 0.9955

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.16.0
  • Tokenizers 0.15.0