Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Paper • 2506.06395 • Published Jun 5 • 130
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs Paper • 2506.07240 • Published Jun 8 • 7
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation Paper • 2506.09991 • Published Jun 11 • 56
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Paper • 2506.06941 • Published Jun 7 • 14
s3: You Don't Need That Much Data to Train a Search Agent via RL Paper • 2505.14146 • Published May 20 • 18
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents Paper • 2506.11763 • Published Jun 13 • 70
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs Paper • 2506.14245 • Published Jun 17 • 42
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Paper • 2506.24119 • Published Jun 30 • 49
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning Paper • 2507.00432 • Published Jul 1 • 76
Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models Paper • 2507.14241 • Published Jul 17 • 17
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization Paper • 2507.15061 • Published Jul 20 • 54
Replacing thinking with tool usage enables reasoning in small language models Paper • 2507.05065 • Published Jul 7 • 15
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities Paper • 2507.13158 • Published Jul 17 • 24
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models Paper • 2508.10751 • Published 22 days ago • 26
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers Paper • 2508.14704 • Published 16 days ago • 42
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs Paper • 2508.16153 • Published 15 days ago • 130
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications Paper • 2508.16279 • Published 14 days ago • 26
InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles Paper • 2508.16072 • Published 15 days ago • 3
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges Paper • 2508.18076 • Published 11 days ago • 5
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation Paper • 2410.20774 • Published Oct 28, 2024
Provable Benefits of In-Tool Learning for Large Language Models Paper • 2508.20755 • Published 8 days ago • 9