gpt2-walamakan-2 / README.md
Karzan's picture
End of training
cc94e0e
|
raw
history blame
2.84 kB
metadata
license: apache-2.0
base_model: Karzan/gpt2-walamakan
tags:
  - generated_from_trainer
model-index:
  - name: gpt2-walamakan-2
    results: []

gpt2-walamakan-2

This model is a fine-tuned version of Karzan/gpt2-walamakan on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 6.7392

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss
0.523 1.0 94 6.4968
0.5212 2.0 188 6.4501
0.488 3.0 282 6.4814
0.4723 4.0 376 6.5004
0.4452 5.0 470 6.5328
0.4442 6.0 564 6.5507
0.4147 7.0 658 6.5598
0.397 8.0 752 6.5623
0.3868 9.0 846 6.5642
0.3686 10.0 940 6.5713
0.3553 11.0 1034 6.6027
0.338 12.0 1128 6.5953
0.3344 13.0 1222 6.6386
0.315 14.0 1316 6.6202
0.3096 15.0 1410 6.6239
0.2961 16.0 1504 6.6648
0.2899 17.0 1598 6.6663
0.2782 18.0 1692 6.6750
0.2642 19.0 1786 6.6777
0.2541 20.0 1880 6.6807
0.2502 21.0 1974 6.6956
0.2453 22.0 2068 6.7099
0.2485 23.0 2162 6.7159
0.2342 24.0 2256 6.7149
0.2226 25.0 2350 6.7288
0.22 26.0 2444 6.7302
0.2172 27.0 2538 6.7315
0.2185 28.0 2632 6.7374
0.2131 29.0 2726 6.7361
0.2089 30.0 2820 6.7392

Framework versions

  • Transformers 4.32.0
  • Pytorch 2.0.1+cu118
  • Datasets 2.14.4
  • Tokenizers 0.13.3