Papers - a yenson-lau Collection

yenson-lau 's Collections

Starred

Papers

Papers

updated 6 days ago

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Paper • 2506.06395 • Published Jun 5 • 130
Magistral

Paper • 2506.10910 • Published Jun 12 • 64
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs

Paper • 2506.07240 • Published Jun 8 • 7
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Paper • 2506.09991 • Published Jun 11 • 56
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 260
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Paper • 2506.06941 • Published Jun 7 • 14
s3: You Don't Need That Much Data to Train a Search Agent via RL

Paper • 2505.14146 • Published May 20 • 18
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents

Paper • 2506.11763 • Published Jun 13 • 70
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs

Paper • 2506.14245 • Published Jun 17 • 42
Reasoning with Exploration: An Entropy Perspective

Paper • 2506.14758 • Published Jun 17 • 30
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Paper • 2506.24119 • Published Jun 30 • 49
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Paper • 2507.00432 • Published Jul 1 • 76
Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models

Paper • 2507.14241 • Published Jul 17 • 17
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization

Paper • 2507.15061 • Published Jul 20 • 54
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 284
Replacing thinking with tool usage enables reasoning in small language models

Paper • 2507.05065 • Published Jul 7 • 15
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities

Paper • 2507.13158 • Published Jul 17 • 24
Deep Researcher with Test-Time Diffusion

Paper • 2507.16075 • Published Jul 21 • 61
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

Paper • 2508.10751 • Published 22 days ago • 26
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers

Paper • 2508.14704 • Published 16 days ago • 42
Deep Think with Confidence

Paper • 2508.15260 • Published 16 days ago • 81
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published 15 days ago • 130
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications

Paper • 2508.16279 • Published 14 days ago • 26
InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles

Paper • 2508.16072 • Published 15 days ago • 3
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges

Paper • 2508.18076 • Published 11 days ago • 5
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation

Paper • 2410.20774 • Published Oct 28, 2024
Provable Benefits of In-Tool Learning for Large Language Models

Paper • 2508.20755 • Published 8 days ago • 9