DiffuEraser: A Diffusion Model for Video Inpainting
Abstract
Recent video inpainting algorithms integrate flow-based pixel propagation with transformer-based generation to leverage optical flow for restoring textures and objects using information from neighboring frames, while completing masked regions through visual Transformers. However, these approaches often encounter blurring and temporal inconsistencies when dealing with large masks, highlighting the need for models with enhanced generative capabilities. Recently, diffusion models have emerged as a prominent technique in image and video generation due to their impressive performance. In this paper, we introduce DiffuEraser, a video inpainting model based on stable diffusion, designed to fill masked regions with greater details and more coherent structures. We incorporate prior information to provide initialization and weak conditioning,which helps mitigate noisy artifacts and suppress hallucinations. Additionally, to improve temporal consistency during long-sequence inference, we expand the temporal receptive fields of both the prior model and DiffuEraser, and further enhance consistency by leveraging the temporal smoothing property of Video Diffusion Models. Experimental results demonstrate that our proposed method outperforms state-of-the-art techniques in both content completeness and temporal consistency while maintaining acceptable efficiency.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- 3D-Consistent Image Inpainting with Diffusion Models (2024)
- Advanced Video Inpainting Using Optical Flow-Guided Efficient Diffusion (2024)
- VipDiff: Towards Coherent and Diverse Video Inpainting via Training-free Denoising Diffusion Models (2025)
- DIVD: Deblurring with Improved Video Diffusion Model (2024)
- SwiftTry: Fast and Consistent Video Virtual Try-On with Diffusion Models (2024)
- DiffMVR: Diffusion-based Automated Multi-Guidance Video Restoration (2024)
- Tuning-Free Long Video Generation via Global-Local Collaborative Diffusion (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper