StreamingVLM: Real-Time Understanding for Infinite Video Streams Paper • 2510.09608 • Published 6 days ago • 41
VChain: Chain-of-Visual-Thought for Reasoning in Video Generation Paper • 2510.05094 • Published 10 days ago • 34
LLaVA-OneVision Collection a model good at arbitrary types of visual input • 17 items • Updated 30 days ago • 31
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Paper • 2509.07980 • Published Sep 9 • 98
Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search Paper • 2509.07969 • Published Sep 9 • 59
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published Sep 2 • 83
Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding Paper • 2507.15028 • Published Jul 20 • 21
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published Jul 17 • 75
A Survey of Context Engineering for Large Language Models Paper • 2507.13334 • Published Jul 17 • 257
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? Paper • 2507.12415 • Published Jul 16 • 42
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning Paper • 2507.05920 • Published Jul 8 • 11
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models Paper • 2502.09696 • Published Feb 13 • 43
Autoregressive Video Generation without Vector Quantization Paper • 2412.14169 • Published Dec 18, 2024 • 14
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published Dec 5, 2024 • 118