llava-v1.5-llama-3-8b-pretrain Model Card
This is a pretrained checkpoint with the MLP connector after LLaVA stage 1, you can use it to instruct tune your multimodal models. Please follow my reproduced implementation LLaVA-Llama-3 for more details on fine-tuning LLaVA model with Llama-3 as the foundatiaon LLM.
Training dataset
- 558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.
Architecture
- LLM: llama-3-8b (Frozen)
- Vision-Language Adapter: MLP
- Vision Encoder: CLIP-ViT-L-336px (Frozen)
- Downloads last month
- 60