FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
Abstract
Visual diffusion models achieve remarkable progress, yet they are typically trained at limited resolutions due to the lack of high-resolution data and constrained computation resources, hampering their ability to generate high-fidelity images or videos at higher resolutions. Recent efforts have explored tuning-free strategies to exhibit the untapped potential higher-resolution visual generation of pre-trained models. However, these methods are still prone to producing low-quality visual content with repetitive patterns. The key obstacle lies in the inevitable increase in high-frequency information when the model generates visual content exceeding its training resolution, leading to undesirable repetitive patterns deriving from the accumulated errors. To tackle this challenge, we propose FreeScale, a tuning-free inference paradigm to enable higher-resolution visual generation via scale fusion. Specifically, FreeScale processes information from different receptive scales and then fuses it by extracting desired frequency components. Extensive experiments validate the superiority of our paradigm in extending the capabilities of higher-resolution visual generation for both image and video models. Notably, compared with the previous best-performing method, FreeScale unlocks the generation of 8k-resolution images for the first time.
Community
FreeScale is a tuning-free method for higher-resolution visual generation, unlocking the 8k image generation!
Project Page: http://haonanqiu.com/projects/FreeScale.html
Code: https://github.com/ali-vilab/FreeScale
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- AccDiffusion v2: Towards More Accurate Higher-Resolution Diffusion Extrapolation (2024)
- FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling (2024)
- TASR: Timestep-Aware Diffusion Model for Image Super-Resolution (2024)
- Multi-Scale Diffusion: Enhancing Spatial Layout in High-Resolution Panoramic Image Generation (2024)
- DiT4Edit: Diffusion Transformer for Image Editing (2024)
- Zoomed In, Diffused Out: Towards Local Degradation-Aware Multi-Diffusion for Extreme Image Super-Resolution (2024)
- FlowDCN: Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper