MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning Paper • 2601.21468 • Published 9 days ago • 20
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding Paper • 2602.01785 • Published 5 days ago • 91
Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR Paper • 2602.05261 • Published 3 days ago • 45
ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought Paper • 2601.23184 • Published 8 days ago • 34
AgentOCR: Reimagining Agent History via Optical Self-Compression Paper • 2601.04786 • Published about 1 month ago • 29
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published Dec 2, 2025 • 255
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper • 2508.10711 • Published Aug 14, 2025 • 145
Slow Perception: Let's Perceive Geometric Figures Step-by-step Paper • 2412.20631 • Published Dec 30, 2024 • 15
Document AI Collection All the papers that can fundementally help in creating a true open-source processing pipeline. • 1 item • Updated Nov 11, 2024 • 1
Focus Anywhere for Fine-grained Multi-page Document Understanding Paper • 2405.14295 • Published May 23, 2024 • 1
PixMo Collection A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 10 items • Updated Dec 23, 2025 • 85
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3, 2024 • 83
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Paper • 2406.16855 • Published Jun 24, 2024 • 57