song_a_day_gpt2_all

This model is a fine-tuned version of gpt2-medium on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 500
training_steps: 1200
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
3.7649	0.1550	50	3.4886
3.6032	0.3101	100	3.3497
3.4405	0.4651	150	3.2716
3.3981	0.6202	200	3.2304
3.3739	0.7752	250	3.2005
3.3871	0.9302	300	3.1779
3.1784	1.0853	350	3.1585
3.1311	1.2403	400	3.1418
3.1673	1.3953	450	3.1298
3.2047	1.5504	500	3.1215
3.1322	1.7054	550	3.1100
3.1048	1.8605	600	3.0982
3.1359	2.0155	650	3.0885
2.9576	2.1705	700	3.0837
2.9204	2.3256	750	3.0745
2.9127	2.4806	800	3.0654
2.8982	2.6357	850	3.0628
3.0112	2.7907	900	3.0554
2.9847	2.9457	950	3.0466
2.7827	3.1008	1000	3.0590
2.7837	3.2558	1050	3.0573
2.8772	3.4109	1100	3.0577
2.8217	3.5659	1150	3.0564
2.813	3.7209	1200	3.0565