V_{0.5}: Generalist Value Model as a Prior for Sparse RL Rollouts Paper • 2603.10848 • Published Mar 11 • 16
VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions Paper • 2605.27141 • Published 8 days ago • 19
CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents Paper • 2605.25624 • Published 9 days ago • 31
Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games Paper • 2506.03610 • Published Jun 4, 2025 • 10
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published 14 days ago • 204
SOD: Step-wise On-policy Distillation for Small Language Model Agents Paper • 2605.07725 • Published 26 days ago • 25
GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment Paper • 2605.19577 • Published 15 days ago • 58
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published Apr 9 • 291
Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context Paper • 2605.13831 • Published 21 days ago • 87
Learning to Self-Verify Makes Language Models Better Reasoners Paper • 2602.07594 • Published Feb 7 • 3
Collaborative Multi-Agent Optimization for Personalized Memory System Paper • 2603.12631 • Published Mar 13
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning Paper • 2605.06130 • Published 27 days ago • 111
Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria Paper • 2605.08354 • Published 26 days ago • 23
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning Paper • 2605.06130 • Published 27 days ago • 111
Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation Paper • 2605.03849 • Published 29 days ago • 126
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond Paper • 2604.22748 • Published Apr 24 • 227