Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 5 days ago • 119 • 8
YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models Paper • 2409.13592 • Published Sep 20 • 48 • 9
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper • 2405.21060 • Published May 31 • 63 • 3
FIFO-Diffusion: Generating Infinite Videos from Text without Training Paper • 2405.11473 • Published May 19 • 53 • 8