LLaVA-JP Model Card
This is a pretrained checkpoint, you can use it to instruct tune your multimodal models.
Check out the instructions here
Model details
Model type:
LLaVA-JP is a vision-language model that can converse about input images.
This model is an LVLM model trained using google/siglip-so400m-patch14-384 as the image encoder and llm-jp/llm-jp-1.3b-v1.0 as the text decoder. supports the input of 768 x 768 high resolution images by scaling_on_scales method.
Training dataset
Acknowledgement
License
Apache-2.0
- Downloads last month
- 2
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.