Abstract
Enabling effective collaboration among LLMs is a crucial step toward developing autonomous systems capable of solving complex problems. While LLMs are typically used as single-model generators, where humans critique and refine their outputs, the potential for jointly-trained collaborative models remains largely unexplored. Despite promising results in multi-agent communication and debate settings, little progress has been made in training models to work together on tasks. In this paper, we present a first step toward "Multi-agent LLM training" (MALT) on reasoning problems. Our approach employs a sequential multi-agent setup with heterogeneous LLMs assigned specialized roles: a generator, verifier, and refinement model iteratively solving problems. We propose a trajectory-expansion-based synthetic data generation process and a credit assignment strategy driven by joint outcome based rewards. This enables our post-training setup to utilize both positive and negative trajectories to autonomously improve each model's specialized capabilities as part of a joint sequential system. We evaluate our approach across MATH, GSM8k, and CQA, where MALT on Llama 3.1 8B models achieves relative improvements of 14.14%, 7.12%, and 9.40% respectively over the same baseline model. This demonstrates an early advance in multi-agent cooperative capabilities for performance on mathematical and common sense reasoning questions. More generally, our work provides a concrete direction for research around multi-agent LLM training approaches.
Community
We propose a multi-agent LLM training approach (MALT) that specialises a base LLM into different roles -a generator, verifier, and refinement model - to collaboratively solve reasoning tasks through iterative problem-solving. We use synthetic data generation and outcome-based rewards to improve each model’s capabilities, demonstrating significant performance gains on mathematical and common sense reasoning benchmarks.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Multi-Agent Large Language Models for Conversational Task-Solving (2024)
- Mars-PO: Multi-Agent Reasoning System Preference Optimization (2024)
- Training Language Models to Critique With Multi-agent Feedback (2024)
- Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning (2024)
- MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation (2024)
- Cooperative Strategic Planning Enhances Reasoning Capabilities in Large Language Models (2024)
- LLM-PySC2: Starcraft II learning environment for Large Language Models (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper