Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
Paper
β’
2602.12036
β’
Published
β’
92
The official organization of Tencent Hunyuan team
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models