ScholarGym: Benchmarking Deep Research Workflows on Academic Literature Retrieval Paper • 2601.21654 • Published Jan 29
Can Deep Research Agents Find and Organize? Evaluating the Synthesis Gap with Expert Taxonomies Paper • 2601.12369 • Published Jan 18 • 4
ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks Paper • 2508.15804 • Published Aug 14, 2025 • 15
ResearcherBench: Evaluating Deep AI Research Systems on the Frontiers of Scientific Inquiry Paper • 2507.16280 • Published Jul 22, 2025 • 1
DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis Paper • 2508.20033 • Published Aug 27, 2025 • 10
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents Paper • 2506.11763 • Published Jun 13, 2025 • 74
SAGE: Benchmarking and Improving Retrieval for Deep Research Agents Paper • 2602.05975 • Published Feb 5 • 12
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning Paper • 2603.03790 • Published 6 days ago • 112