Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published Apr 22 • 63
EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment Paper • 2410.09604 • Published Oct 12, 2024
Geospatial Mechanistic Interpretability of Large Language Models Paper • 2505.03368 • Published May 6 • 10
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation Paper • 2505.02836 • Published May 5 • 8
SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding Paper • 2505.17012 • Published May 22 • 12
Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models Paper • 2505.17015 • Published May 22 • 9
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces Paper • 2506.00123 • Published May 30 • 35
Point-MoE: Towards Cross-Domain Generalization in 3D Semantic Segmentation via Mixture-of-Experts Paper • 2505.23926 • Published May 29 • 5
RLPR: Extrapolating RLVR to General Domains without Verifiers Paper • 2506.18254 • Published Jun 23 • 31
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs Paper • 2506.21656 • Published Jun 26 • 14
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning Paper • 2507.00432 • Published Jul 1 • 79
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding Paper • 2507.23478 • Published Jul 31 • 15
MolmoAct: Action Reasoning Models that can Reason in Space Paper • 2508.07917 • Published Aug 11 • 43
Has GPT-5 Achieved Spatial Intelligence? An Empirical Study Paper • 2508.13142 • Published Aug 18 • 34
MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds Paper • 2508.14879 • Published Aug 20 • 65
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space Paper • 2508.19247 • Published Aug 26 • 41
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2 • 216
Drawing2CAD: Sequence-to-Sequence Learning for CAD Generation from Vector Drawings Paper • 2508.18733 • Published Aug 26 • 8
CAD-Tokenizer: Towards Text-based CAD Prototyping via Modality-Specific Tokenization Paper • 2509.21150 • Published 19 days ago • 3
Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls Paper • 2510.00184 • Published 14 days ago • 16
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning Paper • 2510.03259 • Published 18 days ago • 50
Watch and Learn: Learning to Use Computers from Online Videos Paper • 2510.04673 • Published 8 days ago • 9