Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions? Paper • 2509.04292 • Published 1 day ago • 41
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published 5 days ago • 59 • 4
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published 5 days ago • 59
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Paper • 2509.02544 • Published 4 days ago • 99
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published 4 days ago • 76
OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning Paper • 2509.01644 • Published 5 days ago • 24
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published 5 days ago • 59
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published 5 days ago • 59 • 4
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published 12 days ago • 179
VerlTool/pixel_reasoner-7b-grpo-n8-b128-t1.0-lr1e-6-complex-reward-new_global_step_50 8B • Updated 12 days ago • 202
VerlTool/deepsearch-qwen_qwen3-8b-grpo-n16-b128-t1.0-lr1e-6-new_global_step_70 8B • Updated 12 days ago • 363
VerlTool/pixel_reasoner-7b-grpo-n8-b128-t1.0-lr1e-6-complex-reward-new_global_step_50 8B • Updated 12 days ago • 202
VerlTool/deepsearch-qwen_qwen3-8b-grpo-n16-b128-t1.0-lr1e-6-new_global_step_70 8B • Updated 12 days ago • 363