Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published 4 days ago • 89
Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders Paper • 2503.03601 • Published 9 days ago • 208
BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities Paper • 2503.05652 • Published 7 days ago • 10
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning Paper • 2503.05379 • Published 7 days ago • 31
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model Paper • 2503.05132 • Published 8 days ago • 48
Unified Reward Model for Multimodal Understanding and Generation Paper • 2503.05236 • Published 8 days ago • 104
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published 9 days ago • 79
Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers Paper • 2503.00865 • Published 12 days ago • 58
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Paper • 2502.18449 • Published 17 days ago • 68
VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing Paper • 2502.17258 • Published 18 days ago • 73
Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information Paper • 2502.14258 • Published 23 days ago • 25
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 22 days ago • 129
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published 22 days ago • 179