LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation
Abstract
Recent advancements in image generation models have enabled personalized image creation with both user-defined subjects (content) and styles. Prior works achieved personalization by merging corresponding low-rank adaptation parameters (LoRAs) through optimization-based methods, which are computationally demanding and unsuitable for real-time use on resource-constrained devices like smartphones. To address this, we introduce LoRA.rar, a method that not only improves image quality but also achieves a remarkable speedup of over 4000times in the merging process. LoRA.rar pre-trains a hypernetwork on a diverse set of content-style LoRA pairs, learning an efficient merging strategy that generalizes to new, unseen content-style pairs, enabling fast, high-quality personalization. Moreover, we identify limitations in existing evaluation metrics for content-style quality and propose a new protocol using multimodal large language models (MLLM) for more accurate assessment. Our method significantly outperforms the current state of the art in both content and style fidelity, as validated by MLLM assessments and human evaluations.
Community
Excited to announce the release of LoRA.rar, a groundbreaking method for personalized content and style image generation.
In this work, we:
✨ Pre-trained a hypernetwork to enable zero-shot merging of unseen content-style LoRAs.
🤝 Proposed a new evaluation protocol with MARS², a new metric based on Multimodal Large Language Models, for better content-style fidelity assessment, which aligns closely with user preferences.
⚡️ Achieved improved generation fidelity and footprints compared to ZipLoRA (2024 SOTA). LoRA.rar is 4000x faster in the merging process, uses 3x fewer parameters than a single subject-style combination of ZipLoRA, and outperforms the current state of the art in both content and style fidelity, as validated by MLLM assessments and human evaluations.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- LoRA Diffusion: Zero-Shot LoRA Synthesis for Diffusion Model Personalization (2024)
- DreamCache: Finetuning-Free Lightweight Personalized Image Generation via Feature Caching (2024)
- Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs (2024)
- Personalized Image Generation with Large Multimodal Models (2024)
- LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement (2024)
- Diffusion Self-Distillation for Zero-Shot Customized Image Generation (2024)
- Safety Alignment Backfires: Preventing the Re-emergence of Suppressed Concepts in Fine-tuned Text-to-Image Diffusion Models (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper