File size: 3,626 Bytes

b645e2e

---
tags:
- generated_from_keras_callback
model-index:
- name: distilgpt_new_0060
  results: []
---

<!-- This model card has been generated automatically according to the information Keras had access to. You should
probably proofread and complete it, then remove this comment. -->

# distilgpt_new_0060

This model was trained from scratch on an unknown dataset.
It achieves the following results on the evaluation set:
- Train Loss: 1.1173
- Validation Loss: 1.0714
- Epoch: 59

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': 2e-05, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
- training_precision: float32

### Training results

| Train Loss | Validation Loss | Epoch |
|:----------:|:---------------:|:-----:|
| 3.5889     | 2.6197          | 0     |
| 2.4784     | 2.2040          | 1     |
| 2.1855     | 1.9980          | 2     |
| 2.0181     | 1.8643          | 3     |
| 1.9031     | 1.7652          | 4     |
| 1.8166     | 1.6924          | 5     |
| 1.7467     | 1.6360          | 6     |
| 1.6904     | 1.5843          | 7     |
| 1.6430     | 1.5421          | 8     |
| 1.6021     | 1.5059          | 9     |
| 1.5668     | 1.4761          | 10    |
| 1.5359     | 1.4481          | 11    |
| 1.5071     | 1.4220          | 12    |
| 1.4841     | 1.4020          | 13    |
| 1.4608     | 1.3797          | 14    |
| 1.4399     | 1.3595          | 15    |
| 1.4213     | 1.3426          | 16    |
| 1.4031     | 1.3266          | 17    |
| 1.3875     | 1.3113          | 18    |
| 1.3735     | 1.3024          | 19    |
| 1.3600     | 1.2871          | 20    |
| 1.3456     | 1.2753          | 21    |
| 1.3336     | 1.2648          | 22    |
| 1.3214     | 1.2539          | 23    |
| 1.3103     | 1.2451          | 24    |
| 1.3005     | 1.2335          | 25    |
| 1.2905     | 1.2258          | 26    |
| 1.2815     | 1.2179          | 27    |
| 1.2728     | 1.2123          | 28    |
| 1.2643     | 1.2029          | 29    |
| 1.2564     | 1.1980          | 30    |
| 1.2494     | 1.1877          | 31    |
| 1.2414     | 1.1806          | 32    |
| 1.2348     | 1.1788          | 33    |
| 1.2290     | 1.1699          | 34    |
| 1.2209     | 1.1654          | 35    |
| 1.2156     | 1.1575          | 36    |
| 1.2110     | 1.1537          | 37    |
| 1.2046     | 1.1499          | 38    |
| 1.1986     | 1.1436          | 39    |
| 1.1940     | 1.1408          | 40    |
| 1.1877     | 1.1356          | 41    |
| 1.1830     | 1.1314          | 42    |
| 1.1779     | 1.1278          | 43    |
| 1.1737     | 1.1211          | 44    |
| 1.1692     | 1.1192          | 45    |
| 1.1647     | 1.1163          | 46    |
| 1.1611     | 1.1107          | 47    |
| 1.1560     | 1.1066          | 48    |
| 1.1521     | 1.1060          | 49    |
| 1.1489     | 1.1002          | 50    |
| 1.1440     | 1.0960          | 51    |
| 1.1406     | 1.0931          | 52    |
| 1.1373     | 1.0897          | 53    |
| 1.1329     | 1.0855          | 54    |
| 1.1302     | 1.0842          | 55    |
| 1.1265     | 1.0818          | 56    |
| 1.1237     | 1.0784          | 57    |
| 1.1204     | 1.0737          | 58    |
| 1.1173     | 1.0714          | 59    |


### Framework versions

- Transformers 4.20.1
- TensorFlow 2.8.2
- Datasets 2.3.2
- Tokenizers 0.12.1