The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering Paper • 2502.03628 • Published 30 days ago • 12
The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering Paper • 2502.03628 • Published 30 days ago • 12
Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding Paper • 2303.16341 • Published Mar 28, 2023
Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding Paper • 2105.12723 • Published May 26, 2021
VideoGLUE: Video General Understanding Evaluation of Foundation Models Paper • 2307.03166 • Published Jul 6, 2023 • 5
Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition Paper • 2308.11489 • Published Aug 22, 2023
VideoPrism: A Foundational Visual Encoder for Video Understanding Paper • 2402.13217 • Published Feb 20, 2024 • 24
Distilling Vision-Language Models on Millions of Videos Paper • 2401.06129 • Published Jan 11, 2024 • 17
Unified Visual Relationship Detection with Vision and Language Models Paper • 2303.08998 • Published Mar 16, 2023