FlashWorld: High-quality 3D Scene Generation within Seconds Paper • 2510.13678 • Published 23 days ago • 70
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published 25 days ago • 160
Unleashing the Potential of Multimodal LLMs for Zero-Shot Spatio-Temporal Video Grounding Paper • 2509.15178 • Published Sep 18 • 6
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels Paper • 2507.21809 • Published Jul 29 • 131
Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy Paper • 2506.22432 • Published Jun 27 • 13
Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation Paper • 2506.04225 • Published Jun 4 • 28
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models Paper • 2412.09645 • Published Dec 10, 2024 • 36
Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion Paper • 2412.09593 • Published Dec 12, 2024 • 18
Material Anything: Generating Materials for Any 3D Object via Diffusion Paper • 2411.15138 • Published Nov 22, 2024 • 50
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Paper • 2403.05530 • Published Mar 8, 2024 • 66