TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning Paper • 2509.25760 • Published 9 days ago • 47
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training Paper • 2509.03403 • Published Sep 3 • 21
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities Paper • 2507.13158 • Published Jul 17 • 24
Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models Paper • 2506.18945 • Published Jun 23 • 40
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces Paper • 2506.00123 • Published May 30 • 34 • 5
AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models Paper • 2505.22662 • Published May 28 • 6
MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning Paper • 2505.24846 • Published May 30 • 15
MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning Paper • 2505.24846 • Published May 30 • 15