mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data Paper • 2502.08468 • Published 4 days ago • 10
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency Paper • 2502.09621 • Published 3 days ago • 20
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents Paper • 2502.09560 • Published 3 days ago • 27
Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights Paper • 2502.09619 • Published 3 days ago • 28
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published 4 days ago • 125
DPO-Shift: Shifting the Distribution of Direct Preference Optimization Paper • 2502.07599 • Published 5 days ago • 12
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models Paper • 2502.07346 • Published 5 days ago • 43
CoS: Chain-of-Shot Prompting for Long Video Understanding Paper • 2502.06428 • Published 6 days ago • 8
Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training Paper • 2502.06589 • Published 6 days ago • 16
Scaling Pre-training to One Hundred Billion Data for Vision Language Models Paper • 2502.07617 • Published 5 days ago • 24
Teaching Language Models to Critique via Reinforcement Learning Paper • 2502.03492 • Published 12 days ago • 22
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! Paper • 2502.07374 • Published 5 days ago • 29
Magic 1-For-1: Generating One Minute Video Clips within One Minute Paper • 2502.07701 • Published 5 days ago • 27
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance Paper • 2502.08127 • Published 5 days ago • 44
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning Paper • 2502.06533 • Published 6 days ago • 16
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation Paper • 2502.07870 • Published 5 days ago • 39
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models Paper • 2502.06788 • Published 6 days ago • 11