VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions Paper • 2305.18756 • Published May 30, 2023
Collaborative Reasoning on Multi-Modal Semantic Graphs for Video-Grounded Dialogue Generation Paper • 2210.12460 • Published Oct 22, 2022
LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding Paper • 2402.16050 • Published Feb 25, 2024 • 1
Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language Pre-training Paper • 2305.18760 • Published May 30, 2023
LongViTU: Instruction Tuning for Long-Form Video Understanding Paper • 2501.05037 • Published 13 days ago • 1
Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners Paper • 2305.14825 • Published May 24, 2023 • 1
MindDial: Belief Dynamics Tracking with Theory-of-Mind Modeling for Situated Neural Dialogue Generation Paper • 2306.15253 • Published Jun 27, 2023
LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding Paper • 2402.16050 • Published Feb 25, 2024 • 1
HawkEye: Training Video-Text LLMs for Grounding Text in Videos Paper • 2403.10228 • Published Mar 15, 2024
RAM: Towards an Ever-Improving Memory System by Learning from Communications Paper • 2404.12045 • Published Apr 18, 2024 • 2
DiveR-CT: Diversity-enhanced Red Teaming with Relaxing Constraints Paper • 2405.19026 • Published May 29, 2024 • 7
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges Paper • 2409.01071 • Published Sep 2, 2024 • 27
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges Paper • 2409.01071 • Published Sep 2, 2024 • 27
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning Paper • 2408.02210 • Published Aug 5, 2024 • 8
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers Paper • 2406.16747 • Published Jun 24, 2024 • 19
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models Paper • 2406.16338 • Published Jun 24, 2024 • 26
STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering Paper • 2401.03901 • Published Jan 8, 2024
Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent "Middle" Enhancement Paper • 2406.07138 • Published Jun 11, 2024 • 1