Reasoning Papers - a wangbing1416 Collection

wangbing1416 's Collections

RLHF

Reasoning Papers

Reasoning Papers

updated about 15 hours ago

Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization

Paper • 2508.07629 • Published 24 days ago • 39
Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning

Paper • 2508.07101 • Published 26 days ago • 13
Compressing Chain-of-Thought in LLMs via Step Entropy

Paper • 2508.03346 • Published about 1 month ago • 7
Train Long, Think Short: Curriculum Learning for Efficient Reasoning

Paper • 2508.08940 • Published 23 days ago • 24
Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning

Paper • 2508.09726 • Published 22 days ago • 13
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

Paper • 2508.10751 • Published 21 days ago • 26
Beyond Solving Math Quiz: Evaluating the Ability of Large Reasoning Models to Ask for Information

Paper • 2508.11252 • Published 20 days ago • 3
Deep Think with Confidence

Paper • 2508.15260 • Published 14 days ago • 81
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR

Paper • 2508.14029 • Published 16 days ago • 116
CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning

Paper • 2508.15868 • Published 15 days ago • 3
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning

Paper • 2508.16949 • Published 12 days ago • 22
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper • 2508.17445 • Published 11 days ago • 76
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

Paper • 2508.18773 • Published 9 days ago • 14
StepWiser: Stepwise Generative Judges for Wiser Reasoning

Paper • 2508.19229 • Published 9 days ago • 19
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic

Paper • 2509.01363 • Published 3 days ago • 27
Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR

Paper • 2509.02522 • Published 2 days ago • 20