reasoning-agentic - a sheikhjubair Collection

sheikhjubair 's Collections

reasoning-agentic

Data-Training and Eval

reasoning-agentic

updated 18 days ago

OpenAI o1 System Card

Paper • 2412.16720 • Published Dec 21, 2024 • 34
LearnLM: Improving Gemini for Learning

Paper • 2412.16429 • Published Dec 21, 2024 • 22
NILE: Internal Consistency Alignment in Large Language Models

Paper • 2412.16686 • Published Dec 21, 2024 • 8
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 39
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 374
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling

Paper • 2412.15084 • Published Dec 19, 2024 • 13
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 27
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

Paper • 2503.16419 • Published Mar 20 • 76
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't

Paper • 2503.16219 • Published Mar 20 • 52
AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset

Paper • 2504.16891 • Published Apr 23 • 24
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities

Paper • 2504.16078 • Published Apr 22 • 20
ToolRL: Reward is All Tool Learning Needs

Paper • 2504.13958 • Published Apr 16 • 46
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15 • 61
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

Paper • 2504.11456 • Published Apr 15 • 13
Reasoning Models Can Be Effective Without Thinking

Paper • 2504.09858 • Published Apr 14 • 12
AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale

Paper • 2505.08311 • Published May 13 • 18
Are Reasoning Models More Prone to Hallucination?

Paper • 2505.23646 • Published May 29 • 25
ATLAS: Learning to Optimally Memorize the Context at Test Time

Paper • 2505.23735 • Published May 29 • 23
Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning

Paper • 2505.20561 • Published May 26 • 7
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

Paper • 2508.10751 • Published 22 days ago • 26