Vit-GPT2-COCO2017Flickr-01

This model is a fine-tuned version of NourFakih/image-captioning-Vit-GPT2-Flickr8k on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2789
  • Rouge1: 40.4777
  • Rouge2: 15.156
  • Rougel: 36.8755
  • Rougelsum: 36.8813
  • Gen Len: 11.92

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Gen Len Validation Loss Rouge1 Rouge2 Rougel Rougelsum
0.2185 0.08 500 11.9627 0.2288 41.2368 15.6218 37.5796 37.5754
0.2097 0.15 1000 12.1819 0.2266 41.0126 15.773 37.2736 37.2843
0.2067 0.23 1500 11.1865 0.2260 41.0707 15.534 37.4934 37.5044
0.1997 0.31 2000 11.4404 0.2251 41.5488 15.8208 37.704 37.7153
0.1962 0.38 2500 12.1219 0.2241 41.6067 16.1235 37.8372 37.8403
0.1891 0.46 3000 12.0462 0.2246 41.7488 16.5323 38.0498 38.0689
0.1942 0.54 3500 11.8842 0.2252 41.3542 15.7955 37.8567 37.8759
0.186 0.62 4000 11.6954 0.2256 41.4582 15.8671 37.7381 37.7557
0.1822 0.69 4500 11.6962 0.2253 41.6779 15.8426 37.9166 37.9538
0.1829 0.77 5000 11.695 0.2248 41.8987 16.4174 38.3064 38.321
0.1786 0.85 5500 11.9762 0.2251 40.9742 15.6616 37.3227 37.3401
0.1808 0.92 6000 11.7042 0.2260 41.5023 16.0289 37.9925 37.9843
0.1758 1.0 6500 11.8888 0.2262 41.3528 16.0559 37.8786 37.8588
0.1326 1.08 7000 11.8173 0.2394 40.7818 15.486 37.2677 37.2794
0.1291 1.15 7500 11.7969 0.2412 41.4117 16.2382 37.9863 37.9964
0.1314 1.23 8000 11.7969 0.2436 41.1586 15.5594 37.512 37.5293
0.131 1.31 8500 11.8281 0.2427 41.1027 15.817 37.7167 37.7216
0.1322 1.38 9000 11.8927 0.2400 41.4453 16.0873 37.7242 37.735
0.1237 1.46 9500 11.8035 0.2447 40.704 15.0054 37.1021 37.1102
0.1289 1.54 10000 12.2473 0.2441 41.0159 15.5793 37.1366 37.1673
0.1236 1.62 10500 11.6977 0.2452 40.8137 15.3874 37.1591 37.1672
0.1241 1.69 11000 11.4181 0.2465 40.9985 15.3879 37.1388 37.1634
0.1219 1.77 11500 11.7765 0.2463 41.1345 15.6654 37.3921 37.4082
0.1234 1.85 12000 12.1512 0.2444 41.134 15.7004 37.3621 37.3993
0.1193 1.92 12500 11.6831 0.2466 40.568 15.1806 37.0715 37.0779
0.1148 2.0 13000 11.6546 0.2482 41.0991 15.4567 37.4898 37.5136
0.0836 2.08 13500 12.0708 0.2717 40.4842 15.0195 36.8428 36.859
0.0869 2.15 14000 12.0069 0.2731 40.6828 14.8559 36.8299 36.8515
0.0846 2.23 14500 12.02 0.2727 40.1785 14.8884 36.7155 36.7025
0.0829 2.31 15000 12.0535 0.2756 40.9047 15.2085 37.1447 37.1153
0.0855 2.38 15500 12.0346 0.2757 40.8628 14.9646 37.068 37.0583
0.0859 2.46 16000 11.8796 0.2762 40.924 15.2223 37.1443 37.1329
0.0847 2.54 16500 11.9292 0.2786 40.9447 15.2269 37.1398 37.1511
0.0831 2.62 17000 12.0958 0.2770 40.417 14.7542 36.6568 36.6345
0.0828 2.69 17500 11.845 0.2796 40.7295 15.0389 36.9957 36.9706
0.0782 2.77 18000 11.9369 0.2796 40.7406 15.1238 36.9906 36.9817
0.0798 2.85 18500 11.9869 0.2792 40.4692 15.0458 36.8005 36.7953
0.0794 2.92 19000 11.8985 0.2792 40.497 15.1883 36.8923 36.8945
0.0793 3.0 19500 11.92 0.2789 40.4777 15.156 36.8755 36.8813

Framework versions

  • Transformers 4.39.3
  • Pytorch 2.1.2
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
68
Safetensors
Model size
239M params
Tensor type
F32
ยท
Inference API
Inference API (serverless) does not yet support transformers models for this pipeline type.

Model tree for NourFakih/Vit-GPT2-COCO2017Flickr-01

Space using NourFakih/Vit-GPT2-COCO2017Flickr-01 1