DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes
Abstract
The increasing demand for immersive AR/VR applications and spatial intelligence has heightened the need to generate high-quality scene-level and 360{\deg} panoramic video. However, most video diffusion models are constrained by limited resolution and aspect ratio, which restricts their applicability to scene-level dynamic content synthesis. In this work, we propose the DynamicScaler, addressing these challenges by enabling spatially scalable and panoramic dynamic scene synthesis that preserves coherence across panoramic scenes of arbitrary size. Specifically, we introduce a Offset Shifting Denoiser, facilitating efficient, synchronous, and coherent denoising panoramic dynamic scenes via a diffusion model with fixed resolution through a seamless rotating Window, which ensures seamless boundary transitions and consistency across the entire panoramic space, accommodating varying resolutions and aspect ratios. Additionally, we employ a Global Motion Guidance mechanism to ensure both local detail fidelity and global motion continuity. Extensive experiments demonstrate our method achieves superior content and motion quality in panoramic scene-level video generation, offering a training-free, efficient, and scalable solution for immersive dynamic scene creation with constant VRAM consumption regardless of the output video resolution. Our project page is available at https://dynamic-scaler.pages.dev/.
Community
This paper focuses on 360-degree panoramic video generation, a crucial part of spatial intelligence for immersive dynamic scenes. Challenges in this area include difficulties in collecting complete 360-degree panorama video data, limitations in existing methods such as the generalization ability of models trained on small datasets (e.g., 360-degree DVDs) and the limited motion range in inversion strategies (like 4K4DGen), as well as the overlooked issue of achieving continuous and loopable scene-level dynamic effects. To overcome these, we propose DynamicScaler, a novel framework capable of generating high-resolution dynamic effects in infinite spatial dimensions and creating 360-degree dynamic panoramas. It supports text-conditioned and text-image-conditioned generation and can produce theoretically infinite-length or loopable motion effects without the need for training data. By integrating concepts from prior works, it offers a robust, training-free, and data-free solution that addresses data limitations and quality constraints in dynamic effect generation. Our exploration led to unexpected and significant breakthroughs. Notably, our framework can directly generate 360-degree panoramas from text, eliminating the need for large datasets of field-of-view video panoramas, which is a major advancement considering the challenges in obtaining such data and the quality loss in field-of-view transitions. Additionally, the shift mechanism allows for the generation of near-infinite or loopable scene-level dynamic effects, enhancing the immersiveness in AR/VR environments. Overall, our framework sets a better stage for 360-degree panorama generation, showing great potential in applications such as 3D gaming and film design for creating more immersive 4D spatial experiences.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation (2024)
- MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models (2024)
- Imagine360: Immersive 360 Video Generation from Perspective Anchor (2024)
- DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion (2024)
- MovieCharacter: A Tuning-Free Framework for Controllable Character Video Synthesis (2024)
- MSG score: A Comprehensive Evaluation for Multi-Scene Video Generation (2024)
- Gaussians-to-Life: Text-Driven Animation of 3D Gaussian Splatting Scenes (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper