arxiv:2510.10197

Don't Just Fine-tune the Agent, Tune the Environment

Published on Oct 11

· Submitted by

Siyuan Lu (SII) on Oct 14

inclusionAI

Upvote

Authors:

Siyuan Lu ,

Qintong Wu ,

Chenyi Zhuang ,

Abstract

Environment Tuning enables LLM agents to learn complex behaviors from problem instances using a structured curriculum, environment augmentation, and progress rewards, achieving competitive in-distribution performance and superior out-of-distribution generalization.

AI-generated summary

Large Language Model (LLM) agents show great promise for complex, multi-turn tool-use tasks, but their development is often hampered by the extreme scarcity of high-quality training data. Supervised fine-tuning (SFT) on synthetic data leads to overfitting, whereas standard reinforcement learning (RL) struggles with a critical cold-start problem and training instability. To address these challenges, we introduce Environment Tuning, a novel training paradigm that enables agents to learn complex behaviors directly from problem instances without relying on pre-collected expert trajectories. Environment Tuning orchestrates this learning process through a structured curriculum, actionable environment augmentation that provides corrective feedback, and fine-grained progress rewards to ensure stable and efficient exploration. Using only 400 problem instances from Berkeley Function-Calling Leaderboard (BFCL) benchmark, our method not only achieves competitive in-distribution performance against strong baselines but also demonstrates superior out-of-distribution generalization, overcoming the performance collapse common to SFT-based approaches. Our work presents a paradigm shift from supervised fine-tuning on static trajectories to dynamic, environment-based exploration, paving the way for training more robust and data-efficient agents.

View arXiv page View PDF Add to collection

Community

IcyFish

Paper author Paper submitter 7 days ago

This comment has been hidden (marked as Resolved)

IcyFish

Paper author Paper submitter 7 days ago

There are three critical challenges in training LLM agents for multi-turn tool use:

data scarcity - high-quality multi-turn datasets are extremely limited (e.g., BFCL V3 has only 800 samples)
complex environments - agents must navigate diverse tool ecosystems across multiple domains
long interaction chains - success requires consistent performance across all turns where any single failure leads to complete task failure.

To tackle these issues, we propose Environment Tuning, a novel paradigm that shifts focus from fine-tuning the agent to tuning the learning environment itself. Our method combines three key components: a structured 4-stage curriculum (from syntax mastery to full complexity), actionable environment augmentation that provides corrective hints instead of cryptic error messages, and fine-grained progress rewards that replace sparse binary feedback with dense turn-by-turn signals.

The experimental results are quite striking: using only 400 training samples, we boost Qwen2.5-7B-Instruct from 7% to 37% on BFCL V3, and more importantly, demonstrate superior out-of-distribution generalization where traditional SFT methods often collapse. For instance, on ACEBench Agent, we nearly double ToolACE-2's performance from 8.5% to 15.0%. This suggests that learning through dynamic environmental interaction fosters more robust generalization than training on static trajectories - a compelling insight for the future of agent training in data-scarce scenarios.