Papers
arxiv:2510.10197

Don't Just Fine-tune the Agent, Tune the Environment

Published on Oct 11
· Submitted by Siyuan Lu (SII) on Oct 14
Authors:
,
,
,
,

Abstract

Environment Tuning enables LLM agents to learn complex behaviors from problem instances using a structured curriculum, environment augmentation, and progress rewards, achieving competitive in-distribution performance and superior out-of-distribution generalization.

AI-generated summary

Large Language Model (LLM) agents show great promise for complex, multi-turn tool-use tasks, but their development is often hampered by the extreme scarcity of high-quality training data. Supervised fine-tuning (SFT) on synthetic data leads to overfitting, whereas standard reinforcement learning (RL) struggles with a critical cold-start problem and training instability. To address these challenges, we introduce Environment Tuning, a novel training paradigm that enables agents to learn complex behaviors directly from problem instances without relying on pre-collected expert trajectories. Environment Tuning orchestrates this learning process through a structured curriculum, actionable environment augmentation that provides corrective feedback, and fine-grained progress rewards to ensure stable and efficient exploration. Using only 400 problem instances from Berkeley Function-Calling Leaderboard (BFCL) benchmark, our method not only achieves competitive in-distribution performance against strong baselines but also demonstrates superior out-of-distribution generalization, overcoming the performance collapse common to SFT-based approaches. Our work presents a paradigm shift from supervised fine-tuning on static trajectories to dynamic, environment-based exploration, paving the way for training more robust and data-efficient agents.

Community

Paper author Paper submitter
This comment has been hidden (marked as Resolved)
Paper author Paper submitter

There are three critical challenges in training LLM agents for multi-turn tool use:

  1. data scarcity - high-quality multi-turn datasets are extremely limited (e.g., BFCL V3 has only 800 samples)
  2. complex environments - agents must navigate diverse tool ecosystems across multiple domains
  3. long interaction chains - success requires consistent performance across all turns where any single failure leads to complete task failure.

To tackle these issues, we propose Environment Tuning, a novel paradigm that shifts focus from fine-tuning the agent to tuning the learning environment itself. Our method combines three key components: a structured 4-stage curriculum (from syntax mastery to full complexity), actionable environment augmentation that provides corrective hints instead of cryptic error messages, and fine-grained progress rewards that replace sparse binary feedback with dense turn-by-turn signals.

The experimental results are quite striking: using only 400 training samples, we boost Qwen2.5-7B-Instruct from 7% to 37% on BFCL V3, and more importantly, demonstrate superior out-of-distribution generalization where traditional SFT methods often collapse. For instance, on ACEBench Agent, we nearly double ToolACE-2's performance from 8.5% to 15.0%. This suggests that learning through dynamic environmental interaction fosters more robust generalization than training on static trajectories - a compelling insight for the future of agent training in data-scarce scenarios.

table1

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.10197 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.10197 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.10197 in a Space README.md to link it from this page.

Collections including this paper 6