A New Fine-Tuning Approach for LLMs Using Evolution Strategies

Community Article Published October 7, 2025

As you know, all state-of-the-art large language models (LLMs) rely on Reinforcement Learning (RL) for fine-tuning. Fine-tuning is crucial because it adapts large language models to specific tasks, domains, and human values, making them more useful, accurate, and aligned in real-world applications.

But RL has well-known limitations: it is computationally expensive (some models can cost millions of dollars to tune), difficult to scale efficiently and prone to instability and reward hacking. These challenges make it harder to improve LLMs in a reliable and cost-effective way as models grow larger.

A New Fine-Tuning Approach:

The Cognizant AI Lab provides a new alternative to RL: Evolution Strategies (ES). For the first time, we successfully scaled ES to optimize billions of parameters simultaneously, enabling full-parameter fine-tuning of LLMs. The results are striking — ES can outperform state-of-the-art RL methods on key dimensions such as sample efficiency, tolerance to long-horizon rewards, robustness to different base LLMs, has less tendency to reward hacking, and offers more stable performance across runs.

Why It Matters

This research establishes Evolution Strategies (ES) as a practical, scalable, and stable alternative to Reinforcement Learning (RL) for fine-tuning large language models. In the future, it could simplify training by removing gradient calculations and unlock new possibilities for reasoning incentivation, exploration-required tasks, safety alignment, and continual learning.

Read the blog

Read the paper

Community

Sign up or log in to comment