NILE: Internal Consistency Alignment in Large Language Models Paper • 2412.16686 • Published about 1 month ago • 8
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published Dec 20, 2024 • 38
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling Paper • 2412.15084 • Published Dec 19, 2024 • 13