use internVL2.5 78b in a modular fashion

#1
by Walley - opened

Is it possible to use internVL2.5 78b in a modular fashion? Specifically, I would like to attach the internVL head for multimodal tasks and use only qwen2.5 for language questioning and answering.

OpenGVLab org

Yes, it’s possible. For multimodal inputs like images and videos, use the full 78B model. For pure text inputs, use only the language model part (Qwen2.5). See the Quick Start section in the README for details.

Sign up or log in to comment