GutenOCR: A Grounded Vision-Language Front-End for Documents Paper • 2601.14490 • Published 9 days ago • 35
UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation Paper • 2601.11522 • Published 13 days ago • 17
LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning Paper • 2601.10129 • Published 15 days ago • 11
LSRIF: Logic-Structured Reinforcement Learning for Instruction Following Paper • 2601.06431 • Published 20 days ago • 12
Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding Paper • 2601.10611 • Published 14 days ago • 26
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published 16 days ago • 143
Urban Socio-Semantic Segmentation with Vision-Language Reasoning Paper • 2601.10477 • Published 14 days ago • 155