How to trains this vision part by yourself?

by CCRss - opened Aug 11

Aug 11

Hello, I'm interested in training this ViT part using our own dataset or at least fine tune it. Goal is to combine later on this part with language part and make MLLM but it's so hard, can you please give me some suggestion how to do it, pleasee

czczup

OpenGVLab org Aug 22

Hi, you can refer to our InternVL2 series models. We have already combined this vision encoder with an LLM to construct an MLLM, which you can fine-tune directly using your data. You are free to decide whether to fine-tune the vision encoder, MLP projector, or LLM, based on your needs.

For details, please see here: https://internvl.readthedocs.io/en/latest/internvl2.0/finetune.html.

czczup changed discussion status to closed Aug 23

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment