RL from Synthetic Feedback

community

AI & ML interests

None defined yet.

nlile

authored a paper 7 months ago

Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning

Paper • 2506.05256 • Published Jun 5, 2025 • 2

nlile

authored 2 papers 11 months ago

Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

Paper • 2502.17387 • Published Feb 24, 2025 • 7

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

Paper • 2503.01307 • Published Mar 3, 2025 • 38

nlile

authored a paper about 1 year ago

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8, 2025 • 100

nlile

authored 2 papers over 1 year ago

Generative Reward Models

Paper • 2410.12832 • Published Oct 2, 2024 • 7

PERSONA: A Reproducible Testbed for Pluralistic Alignment

Paper • 2407.17387 • Published Jul 24, 2024 • 20

nlile

authored a paper almost 2 years ago

Suppressing Pink Elephants with Direct Principle Feedback

Paper • 2402.07896 • Published Feb 12, 2024 • 11