File size: 2,194 Bytes
5c71264 d8b982f 5c71264 d8b982f 5c71264 d8b982f 5c71264 d8b982f 5c71264 d8b982f 5c71264 d8b982f 5c71264 d8b982f 5c71264 d8b982f 5c71264 d8b982f 5c71264 d8b982f 5c71264 d8b982f 5c71264 d8b982f 5c71264 d8b982f 5c71264 d8b982f 5c71264 d8b982f 5c71264 d8b982f 5c71264 d8b982f 5c71264 d8b982f 5c71264 d8b982f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
---
license: openrail++
library_name: diffusers
tags:
- text-to-image
- text-to-image
- diffusers-training
- diffusers
- stable-diffusion-xl
- stable-diffusion-xl-diffusers
base_model: stabilityai/stable-diffusion-xl-base-1.0
---
# Margin-aware Preference Optimization for Aligning Diffusion Models without Reference
<div align="center">
<img src="assets/mapo_overview.png" width=750/>
</div><br>
We propose **MaPO**, a reference-free, sample-efficient, memory-friendly alignment technique for text-to-image diffusion models. For more details on the technique, please refer to our paper [here] (TODO).
## Developed by
* Jiwoo Hong<sup>*</sup> (KAIST AI)
* Sayak Paul<sup>*</sup> (Hugging Face)
* Noah Lee (KAIST AI)
* Kashif Rasul (Hugging Face)
* James Thorne (KAIST AI)
* Jongheon Jeong (Korea University)
## Dataset
This model was fine-tuned from [Stable Diffusion XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) on the [cartoon split of Pick-Style](mapo-t2i/pick-style-cartoon).
## Training Code
Refer to our code repository [here](https://github.com/mapo-t2i/mapo).
## Inference
```python
from diffusers import DiffusionPipeline, AutoencoderKL, UNet2DConditionModel
import torch
sdxl_id = "stabilityai/stable-diffusion-xl-base-1.0"
vae_id = "madebyollin/sdxl-vae-fp16-fix"
unet_id = "mapo-t2i/mapo-pick-style-cartoon"
vae = AutoencoderKL.from_pretrained(vae_id, torch_dtype=torch.float16)
unet = UNet2DConditionModel.from_pretrained(unet_id, subfolder='unet', torch_dtype=torch.float16)
pipeline = DiffusionPipeline.from_pretrained(sdxl_id, vae=vae, unet=unet, torch_dtype=torch.float16).to("cuda")
prompt = "portrait of gorgeous cyborg with golden hair, high resolution"
image = pipeline(prompt=prompt, num_inference_steps=30).images[0]
```
For qualitative results, please visit our [project website] (TODO).
## Citation
```bibtex
@misc{todo,
title={Margin-aware Preference Optimization for Aligning Diffusion Models without Reference},
author={Jiwoo Hong and Sayak Paul and Noah Lee and Kashif Rasuland James Thorne and Jongheon Jeong},
year={2024},
eprint={todo},
archivePrefix={arXiv},
primaryClass={cs.CV,cs.LG}
}
``` |