Vit-GPT2-COCO2017Flickr-02

This model is a fine-tuned version of nlpconnect/vit-gpt2-image-captioning on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.2598
Rouge1: 41.8246
Rouge2: 16.1808
Rougel: 38.0947
Rougelsum: 38.0582
Gen Len: 11.7462

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Gen Len	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum
0.2425	0.08	500	11.6315	0.2258	40.7869	15.199	37.0489	37.0626
0.2201	0.15	1000	11.9823	0.2249	40.1404	14.8742	36.584	36.5776
0.219	0.23	1500	11.25	0.2247	40.8233	15.4793	37.2918	37.2909
0.2111	0.31	2000	11.3288	0.2235	40.9526	15.2346	37.3222	37.3373
0.2093	0.38	2500	12.0504	0.2231	40.8278	15.4807	37.0495	37.0609
0.2029	0.46	3000	12.0935	0.2237	41.0299	15.7008	37.4951	37.4861
0.2078	0.54	3500	11.7654	0.2233	40.6441	15.5267	37.1304	37.1546
0.1998	0.62	4000	11.7535	0.2241	41.2438	15.6237	37.3616	37.3653
0.1963	0.69	4500	11.5485	0.2237	41.5874	15.9016	38.0843	38.1149
0.197	0.77	5000	11.5915	0.2238	41.2501	16.2728	37.4111	37.4342
0.1924	0.85	5500	11.86	0.2249	40.8554	15.434	37.3203	37.3119
0.1957	0.92	6000	11.8842	0.2248	40.695	15.3006	37.1779	37.1898
0.1919	1.0	6500	11.8185	0.2227	40.4899	15.3529	36.9403	36.9674
0.1502	1.08	7000	11.955	0.2332	40.9993	15.3624	37.4968	37.5274
0.1463	1.15	7500	11.7792	0.2340	41.1808	16.0105	37.7805	37.7884
0.1503	1.23	8000	11.5815	0.2364	41.3334	15.6562	37.7087	37.7118
0.1496	1.31	8500	11.8477	0.2320	41.171	15.6112	37.4079	37.4274
0.1491	1.38	9000	11.735	0.2328	41.0707	15.5662	37.5235	37.5222
0.1418	1.46	9500	11.5685	0.2344	41.3775	16.2084	37.8977	37.9202
0.1474	1.54	10000	11.9992	0.2326	41.4136	16.1038	37.4991	37.5212
0.1414	1.62	10500	11.9308	0.2364	41.3191	15.8292	37.5841	37.6033
0.1419	1.69	11000	11.6719	0.2391	41.6061	16.0641	37.9547	37.9706
0.1398	1.77	11500	11.5842	0.2342	41.9828	16.4948	38.2849	38.3078
0.1427	1.85	12000	11.9746	0.2347	41.3131	15.7264	37.4993	37.5159
0.1372	1.92	12500	11.5858	0.2353	41.8467	16.3585	38.1331	38.1278
0.1322	2.0	13000	11.3688	0.2368	41.8492	16.1515	38.213	38.2573
0.1031	2.08	13500	11.9769	0.2567	41.3124	15.7976	37.6082	37.6376
0.1061	2.15	14000	12.1223	0.2532	41.651	16.1237	37.9306	37.955
0.1036	2.23	14500	11.8531	0.2571	41.3558	16.0047	37.6471	37.668
0.1023	2.31	15000	11.8785	0.2559	41.4787	15.911	37.7424	37.7684
0.1056	2.38	15500	11.81	0.2566	41.638	16.0218	37.9238	37.9395
0.1034	2.46	16000	11.8492	0.2575	41.5721	16.2242	37.8949	37.9075
0.1037	2.54	16500	11.6635	0.2572	41.6212	15.9041	37.9474	37.9701
0.1017	2.62	17000	11.8096	0.2565	41.4034	15.8097	37.7397	37.7466
0.1019	2.69	17500	11.7215	0.2578	41.5811	15.9254	37.8885	37.9191
0.0955	2.77	18000	11.6642	0.2585	41.8661	16.3595	38.3758	38.3996
0.0975	2.85	18500	11.8031	0.2599	41.5204	15.9178	37.93	37.9513
0.0991	2.92	19000	0.2595	41.9135	16.1875	38.1738	38.1353	11.7381
0.0975	3.0	19500	0.2598	41.8246	16.1808	38.0947	38.0582	11.7462

Framework versions

Transformers 4.39.3
Pytorch 2.1.2
Datasets 2.18.0
Tokenizers 0.15.2

NourFakih
/

Vit-GPT2-COCO2017Flickr-02

Vit-GPT2-COCO2017Flickr-02

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for NourFakih/Vit-GPT2-COCO2017Flickr-02

Space using NourFakih/Vit-GPT2-COCO2017Flickr-02 1

Evaluation results