File size: 2,500 Bytes
96771c1 7a62402 5868a9c 7d58a57 7a62402 c6827af ddaf5a2 3d4d2de 7a62402 3d4d2de beb8c9f 7a62402 8689b54 5e7020c 7a62402 27feebe 7a62402 a662069 7a62402 8269968 d5e6fb2 8269968 7d58a57 d5e6fb2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
---
language:
- en
library_name: diffusers
pipeline_tag: text-to-image
---
# You Only Sample Once (YOSO)
![overview](overview.jpg)
The YOSO was proposed in "[You Only Sample Once: Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs](https://www.arxiv.org/abs/2403.12931)" by *Yihong Luo, Xiaolong Chen, Xinghua Qu, Jing Tang*.
Official Repository of this paper: [YOSO](https://github.com/Luo-Yihong/YOSO).
This model is fine-tuning from [
PixArt-XL-2-512x512](https://huggingface.co/PixArt-alpha/PixArt-XL-2-512x512), enabling one-step inference to perform text-to-image generation.
We wanna highlight that the YOSO-PixArt was originally trained on 512 resolution. However, we found that we can construct a YOSO that enables generating samples with 1024 resolution by merging with [
PixArt-XL-2-1024-MS](https://huggingface.co/PixArt-alpha/PixArt-XL-2-1024-MS
) (Section 6.3.1 in the paper). The impressive performance indicates the robust generalization ability of our YOSO.
## usage
```python
import torch
from diffusers import PixArtAlphaPipeline, LCMScheduler, Transformer2DModel
transformer = Transformer2DModel.from_pretrained(
"Luo-Yihong/yoso_pixart1024", torch_dtype=torch.float16).to('cuda')
pipe = PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-512x512",
transformer=transformer,
torch_dtype=torch.float16, use_safetensors=True)
pipe = pipe.to('cuda')
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.scheduler.config.prediction_type = "v_prediction"
generator = torch.manual_seed(318)
imgs = pipe(prompt="Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumetric lighting, spectacular, ambient lights, light pollution, cinematic atmosphere, art nouveau style, illustration art artwork by SenseiJaye, intricate detail.",
num_inference_steps=1,
num_images_per_prompt = 1,
generator = generator,
guidance_scale=1.,
)[0]
imgs[0]
```
![Ship](ship_1024.jpg)
## Bibtex
```
@misc{luo2024sample,
title={You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs},
author={Yihong Luo and Xiaolong Chen and Xinghua Qu and Jing Tang},
year={2024},
eprint={2403.12931},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
``` |