SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding Paper • 2412.09604 • Published 5 days ago • 35
VisionArena: 230K Real World User-VLM Conversations with Preference Labels Paper • 2412.08687 • Published 6 days ago • 11
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Paper • 2412.05237 • Published 11 days ago • 43
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published 8 days ago • 54
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published 9 days ago • 62
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published 11 days ago • 110
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published 12 days ago • 45
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Paper • 2411.10442 • Published Nov 15 • 61
InternVL 2.5 Collection Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling • 18 items • Updated about 16 hours ago • 67
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos Paper • 2411.04923 • Published Nov 7 • 20
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper • 2411.04905 • Published Nov 7 • 110
Rethinking Data Selection at Scale: Random Selection is Almost All You Need Paper • 2410.09335 • Published Oct 12 • 16
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models Paper • 2410.07985 • Published Oct 10 • 27
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published Oct 14 • 38
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models Paper • 2410.10139 • Published Oct 14 • 51
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models Paper • 2410.09732 • Published Oct 13 • 54
MinerU: An Open-Source Solution for Precise Document Content Extraction Paper • 2409.18839 • Published Sep 27 • 26