VITA-MLLM
/

VITA-1.5

Video-Text-to-Text

Safetensors

vita-Qwen2

Model card Files Files and versions Community

This repository contains the model of the paper VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction.

Code: https://github.com/VITA-MLLM/VITA

Downloads last month: 713

Safetensors

Model size

8.32B params

Tensor type

BF16

Inference Providers NEW

Video-Text-to-Text

This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.