Abstract
Tool-Integrated Reasoning (TIR) enhances Large Language Models (LLMs) by expanding their problem-solving capabilities through the use of external tools, and Advantage Shaping Policy Optimization (ASPO) improves model behavior and tool usage.
We study why Tool-Integrated Reasoning (TIR) makes Large Language Models (LLMs) more capable. While LLMs integrated with tools like Python code interpreters show great promise, a principled theory explaining why this paradigm is effective has been missing. This work provides the first formal proof that TIR fundamentally expands an LLM's capabilities. We demonstrate that tools enable a strict expansion of the model's empirical and feasible support, breaking the capability ceiling of pure-text models by unlocking problem-solving strategies that are otherwise impossible or intractably verbose. To guide model behavior without compromising training stability and performance, we also introduce Advantage Shaping Policy Optimization (ASPO), a novel algorithm that directly modifies the advantage function to guide the policy behavior. We conduct comprehensive experiments on challenging mathematical benchmarks, leveraging a Python interpreter as the external tool. Our results show that the TIR model decisively outperforms its pure-text counterpart on the pass@k metric. Crucially, this advantage is not confined to computationally-intensive problems but extends to those requiring significant abstract insight. We further identify the emergent cognitive patterns that illustrate how models learn to think with tools. Finally, we report improved tool usage behavior with early code invocation and much more interactive turns with ASPO. Overall, our work provides the first principled explanation for TIR's success, shifting the focus from the mere fact that tools work to why and how they enable more powerful reasoning.
Community
Large Language Models (LLMs) using tools like a Python interpreter makes them far more capable. But are Python interpreters just glorified calculators, or is something deeper going on? While many have shown that tools work, the fundamental why and how has been a missing piece of the puzzle. We provide the first formal proof that Tool-Integrated Reasoning (TIR) fundamentally expands an LLM's capabilities, enabling previously impossible reasoning paths (Support Expansion) and making complex strategies practical within a finite token budget (Feasible Support). Our experiments on challenging math benchmarks confirm that TIR models solve a class of problems that are fundamentally out of reach for pure-text models, even on tasks requiring deep abstract insight, not just calculation. To stably guide how a model uses tools, we introduce Advantage Shaping Policy Optimization (ASPO), a novel algorithm that modifies the advantage directly, effectively encouraging desired tool-use behaviors without the training instability and performance loss of traditional reward shaping.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Dissecting Tool-Integrated Reasoning: An Empirical Study and Analysis (2025)
- Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving (2025)
- StepFun-Prover Preview: Let's Think and Verify Step by Step (2025)
- ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models (2025)
- Agentic-R1: Distilled Dual-Strategy Reasoning (2025)
- SABER: Switchable and Balanced Training for Efficient LLM Reasoning (2025)
- ProofCompass: Enhancing Specialized Provers with LLM Guidance (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 2
Datasets citing this paper 2
Spaces citing this paper 0
No Space linking this paper