SciCode: A Research Coding Benchmark Curated by Scientists Paper • 2407.13168 • Published Jul 18 • 13
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks? Paper • 2407.15711 • Published Jul 22 • 9
The Vision of Autonomic Computing: Can LLMs Make It a Reality? Paper • 2407.14402 • Published Jul 19 • 13