use internVL2.5 78b in a modular fashion

by Walley - opened 6 days ago

6 days ago

Is it possible to use internVL2.5 78b in a modular fashion? Specifically, I would like to attach the internVL head for multimodal tasks and use only qwen2.5 for language questioning and answering.

czczup

OpenGVLab org 3 days ago

Yes, it’s possible. For multimodal inputs like images and videos, use the full 78B model. For pure text inputs, use only the language model part (Qwen2.5). See the Quick Start section in the README for details.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment