metadata
license: apache-2.0
Base Model: BLIP2-t5 pretrained version
Finetune data: LLAVA 150k (sample one pair of instruction-answer if multi-round conversations)
Hyper-parameters:
***v0
- lr = 2e-5 --> 0.0 with cosine lr scheduler
- gbs = 32
- image size = 480
- weight decay = 0.05
***v1 (same as LLAVA)
- lr = 2e-5
- gbs = 32
- image size = 480
- weight decay = 0.0