TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models Paper β’ 2410.23266 β’ Published 29 days ago β’ 19
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper β’ 2410.17243 β’ Published Oct 22 β’ 88
VideoLLaMA 2 Collection Optimized VideoLLaMA with improved spatial-temporal modeling and better audio understanding capability β’ 13 items β’ Updated 15 days ago β’ 21