Large Language Models Do NOT Really Know What They Don't Know Paper • 2510.09033 • Published Oct 10 • 16
First Try Matters: Revisiting the Role of Reflection in Reasoning Models Paper • 2510.08308 • Published Oct 9 • 24
Scaling Language-Centric Omnimodal Representation Learning Paper • 2510.11693 • Published 30 days ago • 97
Scaling Language-Centric Omnimodal Representation Learning Paper • 2510.11693 • Published 30 days ago • 97
Scaling Language-Centric Omnimodal Representation Learning Paper • 2510.11693 • Published 30 days ago • 97
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use Paper • 2509.24002 • Published Sep 28 • 170
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources Paper • 2509.21268 • Published Sep 25 • 101
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources Paper • 2509.21268 • Published Sep 25 • 101
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources Paper • 2509.21268 • Published Sep 25 • 101
GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning Paper • 2509.17437 • Published Sep 22 • 17
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth Paper • 2509.03867 • Published Sep 4 • 209
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning Paper • 2506.09513 • Published Jun 11 • 98
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning Paper • 2507.22607 • Published Jul 30 • 46
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning Paper • 2507.22607 • Published Jul 30 • 46
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning Paper • 2507.22607 • Published Jul 30 • 46
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning Paper • 2507.22607 • Published Jul 30 • 46
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published Jul 19 • 131
SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia Paper • 2502.06298 • Published Feb 10 • 1
Large Language Models can Contrastively Refine their Generation for Better Sentence Representation Learning Paper • 2310.10962 • Published Oct 17, 2023