Xudong Xu's picture

264 12

Xudong Xu

Sheldoooon

·

https://sheldontsui.github.io/

SheldonTsui

AI & ML interests

AIGC for Embodied AI

Organizations

upvoted 18 papers 3 months ago

MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning

Paper • 2509.22281 • Published Sep 26 • 31

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26 • 137

SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent

Paper • 2509.20414 • Published Sep 24 • 9

Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets

Paper • 2509.21245 • Published Sep 25 • 38

PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation

Paper • 2509.20358 • Published Sep 24 • 14

Video models are zero-shot learners and reasoners

Paper • 2509.20328 • Published Sep 24 • 98

How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective

Paper • 2509.18905 • Published Sep 23 • 29

Qwen3-Omni Technical Report

Paper • 2509.17765 • Published Sep 22 • 140

Hunyuan3D Studio: End-to-End AI Pipeline for Game-Ready 3D Asset Generation

Paper • 2509.12815 • Published Sep 16 • 39

InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts

Paper • 2509.10813 • Published Sep 13 • 30

OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Paper • 2509.12201 • Published Sep 15 • 104

EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

Paper • 2508.21112 • Published Aug 28 • 77

Mixture of Contexts for Long Video Generation

Paper • 2508.21058 • Published Aug 28 • 35

FastMesh:Efficient Artistic Mesh Generation via Component Decoupling

Paper • 2508.19188 • Published Aug 26 • 17

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208

Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published Aug 21 • 256

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

Paper • 2508.13998 • Published Aug 19 • 18

DINOv3

Paper • 2508.10104 • Published Aug 13 • 285

upvoted 2 papers 4 months ago

MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

Paper • 2508.14879 • Published Aug 20 • 68

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8 • 193