-
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 24 -
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 22 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 101
saeed abhari
galois77
·
AI & ML interests
None yet
Recent Activity
updated
a collection
1 day ago
Reasoning
updated
a collection
1 day ago
Reasoning
updated
a collection
1 day ago
Reasoning
Organizations
None yet
Collections
5
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 51 -
Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization
Paper • 2412.18279 • Published -
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback
Paper • 2501.10799 • Published • 14 -
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Paper • 2501.19324 • Published • 34
models
None public yet
datasets
None public yet