--- license: apache-2.0 --- **Base Model**: BLIP2-t5 pretrained version **Finetune data**: LLAVA 150k (sample one pair of instruction-answer if multi-round conversations) **Hyper-parameters**: ***v0 * lr = 2e-5 --> 0.0 with cosine lr scheduler * gbs = 32 * image size = 480 * weight decay = 0.05 ***v1 (same as LLAVA) * lr = 2e-5 * gbs = 32 * image size = 480 * weight decay = 0.0