Edit model card

Vit-GPT2-COCO2017Flickr-02

This model is a fine-tuned version of nlpconnect/vit-gpt2-image-captioning on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2598
  • Rouge1: 41.8246
  • Rouge2: 16.1808
  • Rougel: 38.0947
  • Rougelsum: 38.0582
  • Gen Len: 11.7462

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Gen Len Validation Loss Rouge1 Rouge2 Rougel Rougelsum
0.2425 0.08 500 11.6315 0.2258 40.7869 15.199 37.0489 37.0626
0.2201 0.15 1000 11.9823 0.2249 40.1404 14.8742 36.584 36.5776
0.219 0.23 1500 11.25 0.2247 40.8233 15.4793 37.2918 37.2909
0.2111 0.31 2000 11.3288 0.2235 40.9526 15.2346 37.3222 37.3373
0.2093 0.38 2500 12.0504 0.2231 40.8278 15.4807 37.0495 37.0609
0.2029 0.46 3000 12.0935 0.2237 41.0299 15.7008 37.4951 37.4861
0.2078 0.54 3500 11.7654 0.2233 40.6441 15.5267 37.1304 37.1546
0.1998 0.62 4000 11.7535 0.2241 41.2438 15.6237 37.3616 37.3653
0.1963 0.69 4500 11.5485 0.2237 41.5874 15.9016 38.0843 38.1149
0.197 0.77 5000 11.5915 0.2238 41.2501 16.2728 37.4111 37.4342
0.1924 0.85 5500 11.86 0.2249 40.8554 15.434 37.3203 37.3119
0.1957 0.92 6000 11.8842 0.2248 40.695 15.3006 37.1779 37.1898
0.1919 1.0 6500 11.8185 0.2227 40.4899 15.3529 36.9403 36.9674
0.1502 1.08 7000 11.955 0.2332 40.9993 15.3624 37.4968 37.5274
0.1463 1.15 7500 11.7792 0.2340 41.1808 16.0105 37.7805 37.7884
0.1503 1.23 8000 11.5815 0.2364 41.3334 15.6562 37.7087 37.7118
0.1496 1.31 8500 11.8477 0.2320 41.171 15.6112 37.4079 37.4274
0.1491 1.38 9000 11.735 0.2328 41.0707 15.5662 37.5235 37.5222
0.1418 1.46 9500 11.5685 0.2344 41.3775 16.2084 37.8977 37.9202
0.1474 1.54 10000 11.9992 0.2326 41.4136 16.1038 37.4991 37.5212
0.1414 1.62 10500 11.9308 0.2364 41.3191 15.8292 37.5841 37.6033
0.1419 1.69 11000 11.6719 0.2391 41.6061 16.0641 37.9547 37.9706
0.1398 1.77 11500 11.5842 0.2342 41.9828 16.4948 38.2849 38.3078
0.1427 1.85 12000 11.9746 0.2347 41.3131 15.7264 37.4993 37.5159
0.1372 1.92 12500 11.5858 0.2353 41.8467 16.3585 38.1331 38.1278
0.1322 2.0 13000 11.3688 0.2368 41.8492 16.1515 38.213 38.2573
0.1031 2.08 13500 11.9769 0.2567 41.3124 15.7976 37.6082 37.6376
0.1061 2.15 14000 12.1223 0.2532 41.651 16.1237 37.9306 37.955
0.1036 2.23 14500 11.8531 0.2571 41.3558 16.0047 37.6471 37.668
0.1023 2.31 15000 11.8785 0.2559 41.4787 15.911 37.7424 37.7684
0.1056 2.38 15500 11.81 0.2566 41.638 16.0218 37.9238 37.9395
0.1034 2.46 16000 11.8492 0.2575 41.5721 16.2242 37.8949 37.9075
0.1037 2.54 16500 11.6635 0.2572 41.6212 15.9041 37.9474 37.9701
0.1017 2.62 17000 11.8096 0.2565 41.4034 15.8097 37.7397 37.7466
0.1019 2.69 17500 11.7215 0.2578 41.5811 15.9254 37.8885 37.9191
0.0955 2.77 18000 11.6642 0.2585 41.8661 16.3595 38.3758 38.3996
0.0975 2.85 18500 11.8031 0.2599 41.5204 15.9178 37.93 37.9513
0.0991 2.92 19000 0.2595 41.9135 16.1875 38.1738 38.1353 11.7381
0.0975 3.0 19500 0.2598 41.8246 16.1808 38.0947 38.0582 11.7462

Framework versions

  • Transformers 4.39.3
  • Pytorch 2.1.2
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
13
Safetensors
Model size
239M params
Tensor type
F32
ยท
Inference API
Inference API (serverless) does not yet support transformers models for this pipeline type.

Model tree for NourFakih/Vit-GPT2-COCO2017Flickr-02

Finetuned
(9)
this model

Space using NourFakih/Vit-GPT2-COCO2017Flickr-02 1