TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages Paper • 2502.11020 • Published Feb 16 • 8
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents Paper • 2505.20411 • Published May 26 • 88