distilgpt_new_0060

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Train Loss: 1.1173
  • Validation Loss: 1.0714
  • Epoch: 59

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • optimizer: {'name': 'AdamWeightDecay', 'learning_rate': 2e-05, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
  • training_precision: float32

Training results

Train Loss Validation Loss Epoch
3.5889 2.6197 0
2.4784 2.2040 1
2.1855 1.9980 2
2.0181 1.8643 3
1.9031 1.7652 4
1.8166 1.6924 5
1.7467 1.6360 6
1.6904 1.5843 7
1.6430 1.5421 8
1.6021 1.5059 9
1.5668 1.4761 10
1.5359 1.4481 11
1.5071 1.4220 12
1.4841 1.4020 13
1.4608 1.3797 14
1.4399 1.3595 15
1.4213 1.3426 16
1.4031 1.3266 17
1.3875 1.3113 18
1.3735 1.3024 19
1.3600 1.2871 20
1.3456 1.2753 21
1.3336 1.2648 22
1.3214 1.2539 23
1.3103 1.2451 24
1.3005 1.2335 25
1.2905 1.2258 26
1.2815 1.2179 27
1.2728 1.2123 28
1.2643 1.2029 29
1.2564 1.1980 30
1.2494 1.1877 31
1.2414 1.1806 32
1.2348 1.1788 33
1.2290 1.1699 34
1.2209 1.1654 35
1.2156 1.1575 36
1.2110 1.1537 37
1.2046 1.1499 38
1.1986 1.1436 39
1.1940 1.1408 40
1.1877 1.1356 41
1.1830 1.1314 42
1.1779 1.1278 43
1.1737 1.1211 44
1.1692 1.1192 45
1.1647 1.1163 46
1.1611 1.1107 47
1.1560 1.1066 48
1.1521 1.1060 49
1.1489 1.1002 50
1.1440 1.0960 51
1.1406 1.0931 52
1.1373 1.0897 53
1.1329 1.0855 54
1.1302 1.0842 55
1.1265 1.0818 56
1.1237 1.0784 57
1.1204 1.0737 58
1.1173 1.0714 59

Framework versions

  • Transformers 4.20.1
  • TensorFlow 2.8.2
  • Datasets 2.3.2
  • Tokenizers 0.12.1
Downloads last month
5
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.