SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks Paper • 2603.24755 • Published 2 days ago • 18
PixelSmile: Toward Fine-Grained Facial Expression Editing Paper • 2603.25728 • Published 1 day ago • 94
Running on CPU Upgrade 32 Cohere Multilingual ASR 🎙 32 Transcribe audio clips to text in many languages
Qworld: Question-Specific Evaluation Criteria for LLMs Paper • 2603.23522 • Published 21 days ago • 9
LagerNVS: Latent Geometry for Fully Neural Real-time Novel View Synthesis Paper • 2603.20176 • Published 7 days ago • 8
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published 2 days ago • 83
Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments Paper • 2603.23638 • Published 3 days ago • 9
UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience Paper • 2603.24533 • Published 2 days ago • 35
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate Paper • 2504.19874 • Published Apr 28, 2025 • 11
GameplayQA: A Benchmarking Framework for Decision-Dense POV-Synced Multi-Video Understanding of 3D Virtual Agents Paper • 2603.24329 • Published 2 days ago • 17
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? Paper • 2603.24472 • Published 2 days ago • 37
DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models Paper • 2603.23499 • Published 3 days ago • 46
TrajLoom: Dense Future Trajectory Generation from Video Paper • 2603.22606 • Published 4 days ago • 5
VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models Paper • 2603.22003 • Published 4 days ago • 11
Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos Paper • 2603.22529 • Published 4 days ago • 5
ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model Paper • 2603.22281 • Published 4 days ago • 13