\$OneMillion-Bench: How Far are Language Agents from Human Experts? Paper • 2603.07980 • Published 9 days ago • 26
PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives Paper • 2505.19558 • Published May 26, 2025