Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models Paper • 2508.09138 • Published 23 days ago • 36
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models Paper • 2508.09138 • Published 23 days ago • 36
OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks Paper • 2508.05614 • Published 28 days ago • 19 • 2
Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models Paper • 2508.05613 • Published 28 days ago • 17
OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks Paper • 2508.05614 • Published 28 days ago • 19
Test-Time Reinforcement Learning for GUI Grounding via Region Consistency Paper • 2508.05615 • Published 28 days ago • 21
Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models Paper • 2508.05613 • Published 28 days ago • 17
Test-Time Reinforcement Learning for GUI Grounding via Region Consistency Paper • 2508.05615 • Published 28 days ago • 21
OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks Paper • 2508.05614 • Published 28 days ago • 19
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face By abidlabs and 4 others • Jul 29 • 167
Hierarchical Budget Policy Optimization for Adaptive Reasoning Paper • 2507.15844 • Published Jul 21 • 16 • 2
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization Paper • 2507.15758 • Published Jul 21 • 34 • 1
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task Paper • 2502.11684 • Published Feb 17 • 2
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task Paper • 2502.11684 • Published Feb 17 • 2
SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation Paper • 2506.03139 • Published Jun 3 • 16
Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization Paper • 2402.17574 • Published Feb 27, 2024