VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions Paper • 2305.18756 • Published May 30, 2023
Collaborative Reasoning on Multi-Modal Semantic Graphs for Video-Grounded Dialogue Generation Paper • 2210.12460 • Published Oct 22, 2022
LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding Paper • 2402.16050 • Published Feb 25, 2024 • 1
Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language Pre-training Paper • 2305.18760 • Published May 30, 2023
LongViTU: Instruction Tuning for Long-Form Video Understanding Paper • 2501.05037 • Published 13 days ago • 1
Friends-MMC: A Dataset for Multi-modal Multi-party Conversation Understanding Paper • 2412.17295 • Published 30 days ago • 9
Friends-MMC: A Dataset for Multi-modal Multi-party Conversation Understanding Paper • 2412.17295 • Published 30 days ago • 9 • 2
VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format Paper • 2411.17991 • Published Nov 27, 2024 • 5 • 2
VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format Paper • 2411.17991 • Published Nov 27, 2024 • 5
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges Paper • 2409.01071 • Published Sep 2, 2024 • 27 • 6
HawkEye: Training Video-Text LLMs for Grounding Text in Videos Paper • 2403.10228 • Published Mar 15, 2024
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges Paper • 2409.01071 • Published Sep 2, 2024 • 27
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges Paper • 2409.01071 • Published Sep 2, 2024 • 27
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning Paper • 2408.02210 • Published Aug 5, 2024 • 8