Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs Paper • 2501.18585 • Published 9 days ago • 51
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning Paper • 2410.06508 • Published Oct 9, 2024 • 10
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning Paper • 2410.06508 • Published Oct 9, 2024 • 10
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search Paper • 2410.03864 • Published Oct 4, 2024 • 11
HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows Paper • 2409.17433 • Published Sep 25, 2024 • 9
HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows Paper • 2409.17433 • Published Sep 25, 2024 • 9
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28, 2024 • 97
Stabilizing RLHF through Advantage Model and Selective Rehearsal Paper • 2309.10202 • Published Sep 18, 2023 • 10
The Trickle-down Impact of Reward (In-)consistency on RLHF Paper • 2309.16155 • Published Sep 28, 2023
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Paper • 2404.12253 • Published Apr 18, 2024 • 55