Training Data Efficiency in Multimodal Process Reward Models Paper • 2602.04145 • Published 9 days ago • 75
TTCS: Test-Time Curriculum Synthesis for Self-Evolving Paper • 2601.22628 • Published 14 days ago • 34
Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing Paper • 2602.03845 • Published 9 days ago • 25
TTCS: Test-Time Curriculum Synthesis for Self-Evolving Paper • 2601.22628 • Published 14 days ago • 34
RelayLLM: Efficient Reasoning via Collaborative Decoding Paper • 2601.05167 • Published Jan 8 • 31
RelayLLM: Efficient Reasoning via Collaborative Decoding Paper • 2601.05167 • Published Jan 8 • 31
Guided Self-Evolving LLMs with Minimal Human Supervision Paper • 2512.02472 • Published Dec 2, 2025 • 54
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research Paper • 2511.19399 • Published Nov 24, 2025 • 61
VisPlay: Self-Evolving Vision-Language Models from Images Paper • 2511.15661 • Published Nov 19, 2025 • 43
TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar Paper • 2510.14972 • Published Oct 16, 2025 • 35
Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning Paper • 2509.22824 • Published Sep 26, 2025 • 21
VideoScore2: Think before You Score in Generative Video Evaluation Paper • 2509.22799 • Published Sep 26, 2025 • 26
Interactive Training: Feedback-Driven Neural Network Optimization Paper • 2510.02297 • Published Oct 2, 2025 • 43
Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls Paper • 2510.00184 • Published Sep 30, 2025 • 17
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Paper • 2509.07980 • Published Sep 9, 2025 • 103
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published Sep 1, 2025 • 78
Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published Aug 27, 2025 • 84
R-Zero: Self-Evolving Reasoning LLM from Zero Data Paper • 2508.05004 • Published Aug 7, 2025 • 130