UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Paper • 2509.02544 • Published 1 day ago • 86
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published 2 days ago • 70
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench Paper • 2508.20931 • Published 7 days ago • 15
PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning Paper • 2508.21104 • Published 7 days ago • 27
T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables Paper • 2508.19813 • Published 8 days ago • 20
Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks Paper • 2508.18672 • Published 9 days ago • 9
ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks Paper • 2508.15804 • Published 21 days ago • 14
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling Paper • 2508.17445 • Published 11 days ago • 76
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published 10 days ago • 176
InternVL3.5 Collection This collection includes all released checkpoints of InternVL3.5, covering different training stages (e.g., Pretraining, SFT, MPO, Cascade RL). • 54 items • Updated 6 days ago • 82
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs Paper • 2508.16153 • Published 13 days ago • 128
AetherCode: Evaluating LLMs' Ability to Win In Premier Programming Competitions Paper • 2508.16402 • Published 13 days ago • 14
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published 16 days ago • 116
ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks Paper • 2508.08240 • Published 24 days ago • 43
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model Paper • 2508.14444 • Published 15 days ago • 35