Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance for Self-supervised Monocular Depth Estimation Paper • 2510.09320 • Published 6 days ago • 1
DreamLLM: Synergistic Multimodal Comprehension and Creation Paper • 2309.11499 • Published Sep 20, 2023 • 59
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge Paper • 2507.04447 • Published Jul 6 • 44
CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image Paper • 2502.12894 • Published Feb 18 • 18
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models Paper • 2506.03135 • Published Jun 3 • 38
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time Paper • 2505.24863 • Published May 30 • 97
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection Paper • 2412.04455 • Published Dec 5, 2024 • 38