arxiv:2601.06860

ET-Agent: Incentivizing Effective Tool-Integrated Reasoning Agent via Behavior Calibration

Published on Jan 11

· Submitted by

Ania Forge on Jan 13

Renmin University of China

Upvote

Authors:

Guanting Dong ,

Abstract

ET-Agent is a training framework that calibrates tool-use behavior in large language models through self-evolving data flywheels and behavior calibration training to improve task execution effectiveness.

AI-generated summary

Large Language Models (LLMs) can extend their parameter knowledge limits by adopting the Tool-Integrated Reasoning (TIR) paradigm. However, existing LLM-based agent training framework often focuses on answers' accuracy, overlooking specific alignment for behavior patterns. Consequently, agent often exhibits ineffective actions during TIR tasks, such as redundant and insufficient tool calls. How to calibrate erroneous behavioral patterns when executing TIR tasks, thereby exploring effective trajectories, remains an open-ended problem. In this paper, we propose ET-Agent, a training framework for calibrating agent's tool-use behavior through two synergistic perspectives: Self-evolving Data Flywheel and Behavior Calibration Training. Specifically, we introduce a self-evolutionary data flywheel to generate enhanced data, used to fine-tune LLM to improve its exploration ability. Based on this, we implement an two-phases behavior-calibration training framework. It is designed to progressively calibrate erroneous behavioral patterns to optimal behaviors. Further in-depth experiments confirm the superiority of across multiple dimensions, including correctness, efficiency, reasoning conciseness, and tool execution accuracy. Our ET-Agent framework provides practical insights for research in the TIR field. Codes can be found in https://github.com/asilverlight/ET-Agent

View arXiv page View PDF Add to collection

Community

zhangboguodong

Paper submitter about 12 hours ago

•

edited about 12 hours ago

Most current TIR work only focuses on the accuracy of agents in downstream tasks, while lacking calibration of the agents' behavioral patterns in TIR tasks. To address this issue, we first quantitatively analyze several possible erroneous behavioral patterns in current TIR tasks, and classify them into two categories: "improper tool use" and "flawed reasoning logic". Based on this, we propose ET-Agent, a framework that fully calibrates the behavioral patterns of agents when performing TIR tasks from both data and algorithm levels. On the data side, we propose a self-evolving data flywheel, which enhances the training data by leveraging the agent's own reflective exploration capabilities. On the algorithm side, we propose a behavioral calibration training framework. It performs rejection sampling fine-tuning on the basis of enhanced training data to broaden the agent's exploration of the action space. Subsequently, we implement iterative behavioral calibration reinforcement learning to calibrate the actions of the fine-tuned agent to the optimal behavioral pattern.

Our contributions are listed as follows:

We provide a comprehensive quantitative analysis of erroneous behavioral patterns in TIR. Inspired by this, we propose ET-Agent, a framework for optimizing TIR's behavioral patterns.
We introduce a self-evolving data flywheel, an iterative loop where the model continuously refines its previous trajectories. This mechanism effectively unfolds the model's action space coverage beyond its initial exploration.
Based on the flywheel, we present a behavior calibration training framework with two phases, aiming to calibrate the model's exploration in tool-use action space to optimal trajectories.
Extensive experiments demonstrate that ET-Agent substantially improves behavioral efficiency, reasoning conciseness, and execution success rates while maintaining high accuracy.