LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory Paper • 2410.10813 • Published Oct 14, 2024 • 15
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published Jul 19, 2025 • 134
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7, 2025 • 185
Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search Paper • 2507.02652 • Published Jul 3, 2025 • 26
DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning Paper • 2508.05405 • Published Aug 7, 2025 • 64
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models Paper • 2507.12806 • Published Jul 17, 2025 • 21
OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks Paper • 2508.05614 • Published Aug 7, 2025 • 20
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments Paper • 2507.10548 • Published Jul 14, 2025 • 37
Intern-S1: A Scientific Multimodal Foundation Model Paper • 2508.15763 • Published Aug 21, 2025 • 268
Biomedical NLP papers Collection Papers posted on @ArxivHealthcareNLP@sigmoid.social (Clinical, Healthcare & Biomedical NLP) • 183 items • Updated Jan 24, 2025 • 43
X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models Paper • 2412.01824 • Published Dec 2, 2024 • 64
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability Paper • 2411.19943 • Published Nov 29, 2024 • 62
MALT: Improving Reasoning with Multi-Agent LLM Training Paper • 2412.01928 • Published Dec 2, 2024 • 45