Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation Paper • 2509.25849 • Published 18 days ago • 46
EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing Paper • 2509.26346 • Published 17 days ago • 17
VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models Paper • 2509.19803 • Published 24 days ago • 116