You Only Sample Once (YOSO)
This algorithm was proposed in You Only Sample Once: Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs.
This model is fine-tuning from PixArt, enabling one-step inference to perform text-to-image generation.
We wanna highlight that the YOSO-PixArt was originally trained on 512 resolution. However we found that we can construct a YOSO that enables generating samples with 1024 resolution by merging with PixArt-1024 (Eq(15) in the paper) as follows: The impressive performance indicates the robust generalization ability of our YOSO.
usage
import torch
from diffusers import PixArtAlphaPipeline, LCMScheduler, Transformer2DModel, DPMSolverMultistepScheduler
transformer = Transformer2DModel.from_pretrained(
"Yihong666/yoso_pixart1024", torch_dtype=torch.float16).to('cuda')
pipe = PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-512x512",
transformer=transformer,
torch_dtype=torch.float16, use_safetensors=True)
pipe = pipe.to('cuda')
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.scheduler.config.prediction_type = "v_prediction"
generator = torch.manual_seed(318)
imgs = pipe(prompt="Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumetric lighting, spectacular, ambient lights, light pollution, cinematic atmosphere, art nouveau style, illustration art artwork by SenseiJaye, intricate detail.",
num_inference_steps=1,
num_images_per_prompt = 1,
generator = generator,
guidance_scale=1.,
)[0]
imgs[0]