Abstract
Recent progress in diffusion-based video editing has shown remarkable potential for practical applications. However, these methods remain prohibitively expensive and challenging to deploy on mobile devices. In this study, we introduce a series of optimizations that render mobile video editing feasible. Building upon the existing image editing model, we first optimize its architecture and incorporate a lightweight autoencoder. Subsequently, we extend classifier-free guidance distillation to multiple modalities, resulting in a threefold on-device speedup. Finally, we reduce the number of sampling steps to one by introducing a novel adversarial distillation scheme which preserves the controllability of the editing process. Collectively, these optimizations enable video editing at 12 frames per second on mobile devices, while maintaining high quality. Our results are available at https://qualcomm-ai-research.github.io/mobile-video-editing/
Community
An efficient zero-shot video diffusion modeling for text based video editing on phone
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Mobile Video Diffusion (2024)
- Movie Gen: A Cast of Media Foundation Models (2024)
- Pathways on the Image Manifold: Image Editing via Video Generation (2024)
- Instruction-Guided Editing Controls for Images and Multimedia: A Survey in LLM era (2024)
- Accelerating Video Diffusion Models via Distribution Matching (2024)
- Stable Flow: Vital Layers for Training-Free Image Editing (2024)
- Optimization-Free Image Immunization Against Diffusion-Based Editing (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper