100Kopenwebtextgptlite_OpenWebText100K
This model has trained on this dataset It achieves the following results on the evaluation set:
- Loss: 5.3490
Model description
The model use GPT 2 Architecture
Intended uses & limitations
The limitation of this model is that the loss is still quite high, so this model is not suitable for text use.
Training and evaluation data
You can check this graph for training and eval loss
Training procedure
Dataset:
openwebtext100k
Training Hardware: 2x T4 GPU from Kaggle
Download dataset -> setup hyperparams -> training
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 32
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 5
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
7.2717 | 0.36 | 1000 | 6.6089 |
6.4412 | 0.71 | 2000 | 6.2425 |
6.1733 | 1.07 | 3000 | 6.0212 |
5.9827 | 1.42 | 4000 | 5.8614 |
5.8549 | 1.78 | 5000 | 5.7380 |
5.7444 | 2.13 | 6000 | 5.6440 |
5.6548 | 2.49 | 7000 | 5.5686 |
5.5952 | 2.84 | 8000 | 5.5093 |
5.5363 | 3.2 | 9000 | 5.4604 |
5.4867 | 3.55 | 10000 | 5.4216 |
5.4578 | 3.91 | 11000 | 5.3911 |
5.4288 | 4.27 | 12000 | 5.3697 |
5.4082 | 4.62 | 13000 | 5.3555 |
5.4009 | 4.98 | 14000 | 5.3490 |
Framework versions
- Transformers 4.38.1
- Pytorch 2.1.2
- Datasets 2.17.1
- Tokenizers 0.15.1
- Downloads last month
- 0
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.