Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction Paper • 2501.03218 • Published about 20 hours ago • 19
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 25 days ago • 136
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published 26 days ago • 92
CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation Paper • 2306.04300 • Published Jun 7, 2023 • 1
VideoLLM-online: Online Video Large Language Model for Streaming Video Paper • 2406.11816 • Published Jun 17, 2024 • 22
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 123
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published Dec 5, 2024 • 105
MotionCLR: Motion Generation and Training-free Editing via Understanding Attention Mechanisms Paper • 2410.18977 • Published Oct 24, 2024 • 14