RLAIF-V-Dataset

This model is a fine-tuned version of llava-hf/llava-v1.6-mistral-7b-hf on the RLAIF-V-Dataset dataset. It achieves the following results on the evaluation set:

Loss: 0.4513
Rewards/chosen: -3.2808
Rewards/rejected: -6.0928
Rewards/accuracies: 0.8212
Rewards/margins: 2.8121
Logps/rejected: -219.8085
Logps/chosen: -191.2850
Logits/rejected: -2.2605
Logits/chosen: -2.2964

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 4
total_train_batch_size: 256
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.5989	0.1368	40	0.6069	-0.3887	-0.8615	0.6365	0.4728	-167.4954	-162.3644	-2.4012	-2.4102
0.5452	0.2735	80	0.5331	-0.8812	-1.8338	0.7135	0.9526	-177.2182	-167.2896	-2.5177	-2.5334
0.5026	0.4103	120	0.4925	-1.4411	-2.6703	0.7442	1.2292	-185.5836	-172.8887	-1.9765	-2.0268
0.4511	0.5470	160	0.4683	-1.3283	-3.0284	0.7625	1.7001	-189.1644	-171.7603	-2.0280	-2.0709
0.4562	0.6838	200	0.4528	-1.4943	-3.2675	0.7567	1.7732	-191.5553	-173.4200	-2.1029	-2.1462
0.4189	0.8205	240	0.4494	-1.9309	-3.8899	0.7663	1.9589	-197.7792	-177.7867	-2.4165	-2.4472
0.4484	0.9573	280	0.4432	-1.7397	-3.8238	0.7635	2.0841	-197.1187	-175.8746	-2.1586	-2.2000
0.222	1.0940	320	0.4504	-1.2207	-2.9698	0.7760	1.7491	-188.5780	-170.6839	-2.4060	-2.4397
0.2018	1.2308	360	0.4438	-2.0855	-4.4746	0.7885	2.3891	-203.6262	-179.3325	-2.3445	-2.3790
0.2017	1.3675	400	0.4350	-1.9109	-4.1414	0.7981	2.2305	-200.2943	-177.5862	-2.3022	-2.3351
0.1999	1.5043	440	0.4288	-2.1056	-4.4641	0.8048	2.3585	-203.5214	-179.5331	-2.1361	-2.1716
0.1837	1.6410	480	0.4262	-2.2318	-4.7056	0.8125	2.4738	-205.9359	-180.7949	-2.2127	-2.2452
0.1942	1.7778	520	0.4163	-2.3806	-5.0283	0.8115	2.6478	-209.1637	-182.2829	-2.3333	-2.3675
0.1821	1.9145	560	0.4165	-2.2038	-4.6709	0.8173	2.4671	-205.5893	-180.5155	-2.3238	-2.3543
0.0858	2.0513	600	0.4415	-2.7029	-5.1979	0.8144	2.4950	-210.8597	-185.5066	-2.2872	-2.3220
0.0832	2.1880	640	0.4414	-2.8951	-5.6554	0.8173	2.7603	-215.4344	-187.4282	-2.2892	-2.3247
0.0817	2.3248	680	0.4521	-3.2403	-6.0014	0.8154	2.7611	-218.8945	-190.8804	-2.2697	-2.3056
0.0858	2.4615	720	0.4479	-3.3847	-6.3012	0.8221	2.9165	-221.8926	-192.3248	-2.2708	-2.3072
0.0723	2.5983	760	0.4574	-3.3436	-6.1113	0.8173	2.7677	-219.9932	-191.9133	-2.2754	-2.3103
0.0717	2.7350	800	0.4532	-3.3171	-6.1289	0.8192	2.8118	-220.1688	-191.6483	-2.2610	-2.2973
0.0691	2.8718	840	0.4514	-3.2739	-6.0855	0.8212	2.8116	-219.7354	-191.2166	-2.2604	-2.2964

Framework versions

Transformers 4.45.2
Pytorch 2.4.0+cu121
Datasets 2.21.0
Tokenizers 0.20.3

htlou
/

mm-interp-RLAIF-V-Dataset-llava-mistral

RLAIF-V-Dataset

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for htlou/mm-interp-RLAIF-V-Dataset-llava-mistral

Evaluation results