|
--- |
|
license: apache-2.0 |
|
--- |
|
**Base Model**: BLIP2-t5 pretrained version |
|
|
|
**Finetune data**: |
|
* LLAVA 150k (sample one pair of instruction-answer if multi-round conversations) |
|
* MiniGPT4 3500 pairs |
|
|
|
**Hyper-parameters**: |
|
|
|
* BLIP2-flant5-xl + LLAVA (initial commits) |
|
* **v0**: |
|
* lr = 2e-5 --> 0.0 with cosine lr scheduler |
|
* gbs = 32 |
|
* image size = 480 |
|
* weight decay = 0.05 |
|
|
|
* **v1 (same as LLAVA)**: |
|
* lr = 2e-5 |
|
* gbs = 32 |
|
* image size = 480 |
|
* weight decay = 0.0 |
|
|
|
* BLIP2-flant5-xl + MiniGPT4 |
|
* lr = 2e-5 |
|
* gbs = 32 |
|
* image size = 480 |
|
* weight decay = 0.0 |
|
|