MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning Paper • 2509.22281 • Published Sep 26 • 31
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26 • 137
SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent Paper • 2509.20414 • Published Sep 24 • 9
Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets Paper • 2509.21245 • Published Sep 25 • 38
PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation Paper • 2509.20358 • Published Sep 24 • 14
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective Paper • 2509.18905 • Published Sep 23 • 29
Hunyuan3D Studio: End-to-End AI Pipeline for Game-Ready 3D Asset Generation Paper • 2509.12815 • Published Sep 16 • 39
InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts Paper • 2509.10813 • Published Sep 13 • 30
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling Paper • 2509.12201 • Published Sep 15 • 104
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control Paper • 2508.21112 • Published Aug 28 • 77
FastMesh:Efficient Artistic Mesh Generation via Component Decoupling Paper • 2508.19188 • Published Aug 26 • 17
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25 • 208
Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation Paper • 2508.13998 • Published Aug 19 • 18
MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds Paper • 2508.14879 • Published Aug 20 • 68
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models Paper • 2508.06471 • Published Aug 8 • 193