InteractComp: Evaluating Search Agents With Ambiguous Queries Paper • 2510.24668 • Published 11 days ago • 96
Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting Paper • 2505.19716 • Published May 26 • 4
You Don't Know Until You Click:Automated GUI Testing for Production-Ready Software Evaluation Paper • 2508.14104 • Published Aug 17 • 1
VeritasFi: An Adaptable, Multi-tiered RAG Framework for Multi-modal Financial Question Answering Paper • 2510.10828 • Published 26 days ago • 1
ReCode: Unify Plan and Action for Universal Granularity Control Paper • 2510.23564 • Published 12 days ago • 118
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning Paper • 2503.07459 • Published Mar 10 • 16
Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search Paper • 2502.17248 • Published Feb 24 • 1
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence Paper • 2507.21046 • Published Jul 28 • 81