PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing Paper • 2601.21957 • Published 11 days ago • 19
Unified Personalized Reward Model for Vision Generation Paper • 2602.02380 • Published 7 days ago • 19
UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing Paper • 2602.02437 • Published 7 days ago • 75
Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning Paper • 2510.27606 • Published Oct 31, 2025 • 30
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows Paper • 2510.24411 • Published Oct 28, 2025 • 72
Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models Paper • 2508.00819 • Published Aug 1, 2025 • 63
Adaptive Fast-and-Slow Visual Program Reasoning for Long-Form VideoQA Paper • 2509.17743 • Published Sep 22, 2025
$\text{G}^2$RPO: Granular GRPO for Precise Reward in Flow Models Paper • 2510.01982 • Published Oct 2, 2025 • 7
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers Paper • 2305.17455 • Published May 27, 2023
UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation Paper • 2510.18701 • Published Oct 21, 2025 • 67
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence Paper • 2510.24693 • Published Oct 28, 2025 • 19
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence Paper • 2510.24693 • Published Oct 28, 2025 • 19
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published Oct 27, 2025 • 179
UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation Paper • 2510.18701 • Published Oct 21, 2025 • 67
G^2RPO: Granular GRPO for Precise Reward in Flow Models Paper • 2510.01982 • Published Oct 2, 2025 • 7
$\text{G}^2$RPO: Granular GRPO for Precise Reward in Flow Models Paper • 2510.01982 • Published Oct 2, 2025 • 7 • 2
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26, 2025 • 143
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning Paper • 2508.20096 • Published Aug 27, 2025 • 37
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning Paper • 2508.20751 • Published Aug 28, 2025 • 89
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning Paper • 2508.20751 • Published Aug 28, 2025 • 89