DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation Paper • 2601.09688 • Published 1 day ago • 101
MAI-UI Technical Report: Real-World Centric Foundation GUI Agents Paper • 2512.22047 • Published 21 days ago • 26
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them Paper • 2509.21117 • Published Sep 25, 2025 • 29
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper • 2508.10711 • Published Aug 14, 2025 • 145