How to trains this vision part by yourself?
#9
by
CCRss
- opened
Hello, I'm interested in training this ViT part using our own dataset or at least fine tune it. Goal is to combine later on this part with language part and make MLLM but it's so hard, can you please give me some suggestion how to do it, pleasee
Hi, you can refer to our InternVL2 series models. We have already combined this vision encoder with an LLM to construct an MLLM, which you can fine-tune directly using your data. You are free to decide whether to fine-tune the vision encoder, MLP projector, or LLM, based on your needs.
For details, please see here: https://internvl.readthedocs.io/en/latest/internvl2.0/finetune.html.
czczup
changed discussion status to
closed