VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide
Abstract
Text-to-image (T2I) diffusion models have revolutionized visual content creation, but extending these capabilities to text-to-video (T2V) generation remains a challenge, particularly in preserving temporal consistency. Existing methods that aim to improve consistency often cause trade-offs such as reduced imaging quality and impractical computational time. To address these issues we introduce VideoGuide, a novel framework that enhances the temporal consistency of pretrained T2V models without the need for additional training or fine-tuning. Instead, VideoGuide leverages any pretrained video diffusion model (VDM) or itself as a guide during the early stages of inference, improving temporal quality by interpolating the guiding model's denoised samples into the sampling model's denoising process. The proposed method brings about significant improvement in temporal consistency and image fidelity, providing a cost-effective and practical solution that synergizes the strengths of various video diffusion models. Furthermore, we demonstrate prior distillation, revealing that base models can achieve enhanced text coherence by utilizing the superior data prior of the guiding model through the proposed method. Project Page: http://videoguide2025.github.io/
Community
Hello, thank you so much for interest in our work!
We would like to share our updated Project Page and Code :)
Project Page: https://dohunlee1.github.io/videoguide.github.io/
Code: https://github.com/DoHunLee1/VideoGuide
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- One-Shot Learning Meets Depth Diffusion in Multi-Object Videos (2024)
- CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities (2024)
- Training-free Long Video Generation with Chain of Diffusion Model Experts (2024)
- JVID: Joint Video-Image Diffusion for Visual-Quality and Temporal-Consistency in Video Generation (2024)
- Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper