RL - a Andynsn Collection

Andynsn 's Collections

RL

updated 2 days ago

RL

Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

Paper • 2603.25562 • Published Mar 26 • 19
Self-Distilled Agentic Reinforcement Learning

Paper • 2605.15155 • Published 25 days ago • 111
DRAGON: Distributional Rewards Optimize Diffusion Generative Models

Paper • 2504.15217 • Published Apr 21, 2025 • 11
Diffusion Policy Policy Optimization

Paper • 2409.00588 • Published Sep 1, 2024 • 20
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

Paper • 2510.25992 • Published Oct 29, 2025 • 48
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

Paper • 2605.11609 • Published 27 days ago • 195
Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories

Paper • 2606.03979 • Published 6 days ago • 25
Self-Distilled Policy Gradient

Paper • 2606.04036 • Published 6 days ago • 22