use internVL2.5 78b in a modular fashion
#1
by
Walley
- opened
Is it possible to use internVL2.5 78b in a modular fashion? Specifically, I would like to attach the internVL head for multimodal tasks and use only qwen2.5 for language questioning and answering.
Yes, it’s possible. For multimodal inputs like images and videos, use the full 78B model. For pure text inputs, use only the language model part (Qwen2.5). See the Quick Start section in the README for details.