Vit-GPT2-COCO2017Flickr-115k-12

This model is a fine-tuned version of NourFakih/Vit-GPT2-COCO2017Flickr-115k-12 on an unknown dataset. It achieves the following results on the evaluation set:

  • Gen Len: 11.9339
  • Loss: 0.5262
  • Rouge1: 40.547
  • Rouge2: 14.9559
  • Rougel: 36.6474
  • Rougelsum: 36.655

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Gen Len Validation Loss Rouge1 Rouge2 Rougel Rougelsum
0.3804 0.0696 500 11.8970 0.4475 41.2997 15.8517 37.5088 37.5135
0.383 0.1391 1000 11.4465 0.4486 41.0786 15.592 37.2557 37.247
0.3804 0.2087 1500 11.7144 0.4462 41.0521 15.5552 37.3142 37.3142
0.376 0.2783 2000 11.7237 0.4503 41.0712 15.4215 37.1593 37.1483
0.3742 0.3478 2500 12.056 0.4424 41.0197 15.5533 37.0838 37.0828
0.3702 0.4174 3000 11.51 0.4476 41.3443 15.849 37.5657 37.573
0.3682 0.4870 3500 11.9582 0.4470 41.5477 16.1138 37.5701 37.5784
0.367 0.5565 4000 11.3945 0.4440 41.244 15.7945 37.5079 37.5147
0.3609 0.6261 4500 11.8043 0.4479 41.0531 15.81 37.3273 37.3317
0.3638 0.6957 5000 11.6248 0.4414 41.3129 15.94 37.5627 37.583
0.3543 0.7652 5500 11.5961 0.4482 41.1584 15.7318 37.4062 37.4104
0.3521 0.8348 6000 11.7494 0.4445 41.3954 15.8539 37.6111 37.6208
0.3537 0.9043 6500 11.8193 0.4438 41.6655 16.1228 37.7752 37.7812
0.3459 0.9739 7000 11.614 0.4427 41.4156 15.8963 37.5839 37.5837
0.3092 1.0435 7500 11.9294 0.4620 41.2316 15.7114 37.3554 37.3522
0.2924 1.1130 8000 11.8922 0.4626 40.8321 15.5115 37.0416 37.0319
0.2886 1.1826 8500 11.8237 0.4692 40.9025 15.347 37.0047 36.9942
0.2901 1.2522 9000 11.8817 0.4649 41.0739 15.3744 37.1782 37.1775
0.2868 1.3217 9500 11.8390 0.4668 41.2378 15.6205 37.3887 37.3857
0.2825 1.3913 10000 11.6785 0.4680 41.0405 15.5335 37.1249 37.1293
0.2843 1.4609 10500 11.792 0.4753 40.7387 15.0593 36.8401 36.8406
0.2819 1.5305 11000 11.7928 0.4718 41.1479 15.5299 37.2971 37.2975
0.2791 1.6001 11500 11.7897 0.4728 40.8974 15.2068 37.0748 37.0794
0.2756 1.6696 12000 11.7343 0.4776 40.9051 15.3769 37.1201 37.122
0.275 1.7392 12500 11.6749 0.4799 41.0987 15.4789 37.1856 37.1813
0.2703 1.8088 13000 11.6395 0.4787 41.211 15.6686 37.34 37.3342
0.2733 1.8783 13500 11.7356 0.4808 41.1137 15.5692 37.2625 37.2592
0.27 1.9479 14000 11.7816 0.4818 41.4823 15.8925 37.6161 37.6165
0.255 2.0175 14500 11.6703 0.5038 40.9089 15.2776 36.9295 36.9275
0.2292 2.0871 15000 11.8069 0.5076 41.0044 15.3659 37.0441 37.0438
0.2226 2.1567 15500 11.8068 0.5129 40.8428 15.2553 36.9787 36.9642
0.2236 2.2262 16000 11.8470 0.5153 40.6711 15.13 36.8865 36.8805
0.2215 2.2958 16500 11.8765 0.5200 40.7621 15.0155 36.8522 36.85
0.2215 2.3654 17000 11.7364 0.5186 40.6314 15.1297 36.8185 36.8123
0.2202 2.4349 17500 11.9915 0.5208 40.6588 15.0601 36.5692 36.5715
0.2126 2.5045 18000 11.8697 0.5218 40.6073 14.8542 36.6365 36.6408
0.2178 2.5741 18500 11.9752 0.5221 40.2978 14.671 36.4154 36.4278
0.2179 2.6436 19000 11.9277 0.5201 40.5832 14.9498 36.6925 36.7034
0.2165 2.7132 19500 11.9769 0.5252 40.5697 15.0438 36.7188 36.7223
0.2134 2.7827 20000 11.938 0.5269 40.6224 15.0783 36.6861 36.6882
0.2157 2.8523 20500 11.8804 0.5289 40.4897 14.9113 36.6069 36.6136
0.2142 2.9219 21000 11.8809 0.5260 40.4523 14.952 36.6097 36.616
0.2124 2.9914 21500 11.9339 0.5262 40.547 14.9559 36.6474 36.655

Framework versions

  • Transformers 4.42.3
  • Pytorch 2.1.2
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
86
Safetensors
Model size
239M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for NourFakih/Vit-GPT2-COCO2017Flickr-115k-12

Unable to build the model tree, the base model loops to the model itself. Learn more.