ReferEverything: Towards Segmenting Everything We Can Speak of in Videos Paper • 2410.23287 • Published 7 days ago • 17
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding Paper • 2409.03757 • Published Sep 5 • 2
Situational Awareness Matters in 3D Vision Language Reasoning Paper • 2406.07544 • Published Jun 11 • 1
Frozen Transformers in Language Models Are Effective Visual Encoder Layers Paper • 2310.12973 • Published Oct 19, 2023 • 1
Floating No More: Object-Ground Reconstruction from a Single Image Paper • 2407.18914 • Published Jul 26 • 18