RLAIF-V-Dataset

This model is a fine-tuned version of llava-hf/llava-v1.6-mistral-7b-hf on the RLAIF-V-Dataset dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4513
  • Rewards/chosen: -3.2808
  • Rewards/rejected: -6.0928
  • Rewards/accuracies: 0.8212
  • Rewards/margins: 2.8121
  • Logps/rejected: -219.8085
  • Logps/chosen: -191.2850
  • Logits/rejected: -2.2605
  • Logits/chosen: -2.2964

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 256
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5989 0.1368 40 0.6069 -0.3887 -0.8615 0.6365 0.4728 -167.4954 -162.3644 -2.4012 -2.4102
0.5452 0.2735 80 0.5331 -0.8812 -1.8338 0.7135 0.9526 -177.2182 -167.2896 -2.5177 -2.5334
0.5026 0.4103 120 0.4925 -1.4411 -2.6703 0.7442 1.2292 -185.5836 -172.8887 -1.9765 -2.0268
0.4511 0.5470 160 0.4683 -1.3283 -3.0284 0.7625 1.7001 -189.1644 -171.7603 -2.0280 -2.0709
0.4562 0.6838 200 0.4528 -1.4943 -3.2675 0.7567 1.7732 -191.5553 -173.4200 -2.1029 -2.1462
0.4189 0.8205 240 0.4494 -1.9309 -3.8899 0.7663 1.9589 -197.7792 -177.7867 -2.4165 -2.4472
0.4484 0.9573 280 0.4432 -1.7397 -3.8238 0.7635 2.0841 -197.1187 -175.8746 -2.1586 -2.2000
0.222 1.0940 320 0.4504 -1.2207 -2.9698 0.7760 1.7491 -188.5780 -170.6839 -2.4060 -2.4397
0.2018 1.2308 360 0.4438 -2.0855 -4.4746 0.7885 2.3891 -203.6262 -179.3325 -2.3445 -2.3790
0.2017 1.3675 400 0.4350 -1.9109 -4.1414 0.7981 2.2305 -200.2943 -177.5862 -2.3022 -2.3351
0.1999 1.5043 440 0.4288 -2.1056 -4.4641 0.8048 2.3585 -203.5214 -179.5331 -2.1361 -2.1716
0.1837 1.6410 480 0.4262 -2.2318 -4.7056 0.8125 2.4738 -205.9359 -180.7949 -2.2127 -2.2452
0.1942 1.7778 520 0.4163 -2.3806 -5.0283 0.8115 2.6478 -209.1637 -182.2829 -2.3333 -2.3675
0.1821 1.9145 560 0.4165 -2.2038 -4.6709 0.8173 2.4671 -205.5893 -180.5155 -2.3238 -2.3543
0.0858 2.0513 600 0.4415 -2.7029 -5.1979 0.8144 2.4950 -210.8597 -185.5066 -2.2872 -2.3220
0.0832 2.1880 640 0.4414 -2.8951 -5.6554 0.8173 2.7603 -215.4344 -187.4282 -2.2892 -2.3247
0.0817 2.3248 680 0.4521 -3.2403 -6.0014 0.8154 2.7611 -218.8945 -190.8804 -2.2697 -2.3056
0.0858 2.4615 720 0.4479 -3.3847 -6.3012 0.8221 2.9165 -221.8926 -192.3248 -2.2708 -2.3072
0.0723 2.5983 760 0.4574 -3.3436 -6.1113 0.8173 2.7677 -219.9932 -191.9133 -2.2754 -2.3103
0.0717 2.7350 800 0.4532 -3.3171 -6.1289 0.8192 2.8118 -220.1688 -191.6483 -2.2610 -2.2973
0.0691 2.8718 840 0.4514 -3.2739 -6.0855 0.8212 2.8116 -219.7354 -191.2166 -2.2604 -2.2964

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.20.3
Downloads last month
12
Safetensors
Model size
7.57B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the HF Inference API does not support transformers models with pipeline type image-text-to-text

Model tree for htlou/mm-interp-RLAIF-V-Dataset-llava-mistral

Finetuned
(67)
this model