AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs Paper • 2507.05687 • Published Jul 8 • 26
RLPR: Extrapolating RLVR to General Domains without Verifiers Paper • 2506.18254 • Published Jun 23 • 32
MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding Paper • 2505.20715 • Published May 27 • 2
MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding Paper • 2505.20715 • Published May 27 • 2 • 2
An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes Paper • 2504.15270 • Published Apr 21 • 10
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization Paper • 2503.23733 • Published Mar 31 • 11
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization Paper • 2503.23733 • Published Mar 31 • 11
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization Paper • 2503.23733 • Published Mar 31 • 11 • 3
Towards Self-Improving Systematic Cognition for Next-Generation Foundation MLLMs Paper • 2503.12303 • Published Mar 16 • 7
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding Paper • 2503.12797 • Published Mar 17 • 32
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding Paper • 2503.12797 • Published Mar 17 • 32
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding Paper • 2503.12797 • Published Mar 17 • 32 • 2
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation Paper • 2501.06598 • Published Jan 11 • 2
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation Paper • 2501.06598 • Published Jan 11 • 2
Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models Paper • 2501.05767 • Published Jan 10 • 30
Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models Paper • 2501.05767 • Published Jan 10 • 30
Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models Paper • 2501.05767 • Published Jan 10 • 30 • 2
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer Paper • 2412.13871 • Published Dec 18, 2024 • 18
StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding Paper • 2411.03628 • Published Nov 6, 2024 • 2