Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning Paper • 2508.20751 • Published Aug 28 • 89
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling Paper • 2508.17445 • Published Aug 24 • 80
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space Paper • 2508.19247 • Published Aug 26 • 41
USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning Paper • 2508.18966 • Published Aug 26 • 56
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2 • 222
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published Sep 2 • 83
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model Paper • 2509.00676 • Published Aug 31 • 83
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published Sep 1 • 73
Gated Associative Memory: A Parallel O(N) Architecture for Efficient Sequence Modeling Paper • 2509.00605 • Published Aug 30 • 42
DeepResearch Arena: The First Exam of LLMs' Research Abilities via Seminar-Grounded Tasks Paper • 2509.01396 • Published Sep 1 • 56
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model Paper • 2510.12276 • Published 29 days ago • 142
Agent Lightning: Train ANY AI Agents with Reinforcement Learning Paper • 2508.03680 • Published Aug 5 • 114
Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer Paper • 2510.25976 • Published 13 days ago • 11
Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization Paper • 2510.25616 • Published 13 days ago • 89
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper • 2511.02778 • Published 7 days ago • 96
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought Paper • 2511.02779 • Published 7 days ago • 52
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published 5 days ago • 185
The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms Paper • 2511.04217 • Published 6 days ago • 14
HaluMem: Evaluating Hallucinations in Memory Systems of Agents Paper • 2511.03506 • Published 6 days ago • 71
IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction Paper • 2511.07327 • Published 1 day ago • 57
SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization Paper • 2511.06411 • Published 2 days ago • 13