Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes Paper • 2603.25562 • Published Mar 26 • 19
DRAGON: Distributional Rewards Optimize Diffusion Generative Models Paper • 2504.15217 • Published Apr 21, 2025 • 11
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Paper • 2510.25992 • Published Oct 29, 2025 • 48
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published 27 days ago • 195
Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories Paper • 2606.03979 • Published 6 days ago • 25