CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation Paper • 2602.24286 • Published 4 days ago • 59
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale Paper • 2602.23866 • Published 5 days ago • 47
RubricBench: Aligning Model-Generated Rubrics with Human Standards Paper • 2603.01562 • Published 2 days ago • 44
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization Paper • 2602.23008 • Published 6 days ago • 33
PyVision-RL: Forging Open Agentic Vision Models via RL Paper • 2602.20739 • Published 8 days ago • 29
On Data Engineering for Scaling LLM Terminal Capabilities Paper • 2602.21193 • Published 7 days ago • 89
SkillOrchestra: Learning to Route Agents via Skill Transfer Paper • 2602.19672 • Published 9 days ago • 55
CADEvolve: Creating Realistic CAD via Program Evolution Paper • 2602.16317 • Published 14 days ago • 26
AutoWebWorld: Synthesizing Infinite Verifiable Web Environments via Finite State Machines Paper • 2602.14296 • Published 16 days ago • 48
Discovering Multiagent Learning Algorithms with Large Language Models Paper • 2602.16928 • Published 13 days ago • 16
HLE-Verified: A Systematic Verification and Structured Revision of Humanity's Last Exam Paper • 2602.13964 • Published 17 days ago • 10
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Paper • 2602.12670 • Published 19 days ago • 54