superdiff-sd-v1-4 / README.md
mskrt's picture
Update README.md
c6cc841 verified
metadata
base_model:
  - CompVis/stable-diffusion-v1-4
pipeline_tag: text-to-image
tags:
  - art

The Superposition of Diffusion Models Using the It么 Density Estimator: Pipeline

arXiv

This pipeline shows how to superimpose different text prompts from Stable Diffusion v1-4 based the paper The Superposition of Diffusion Models Using the It么 Density Estimator.

drawing

Requirements

This pipeline can be run with the following packages & versions:

  • PyTorch 2.5.1
  • Diffusers 0.32.1
  • Accelerate 1.2.1
  • Transformers 4.47.1

You can install these with:

pip install torch
pip install diffusers accelerate transformers

Example usage

from PIL import Image
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained("superdiff/superdiff-sd-v1-4", custom_pipeline='pipeline', trust_remote_code=True)
output = pipeline("a flamingo", "a candy cane", seed=1, num_inference_steps=1000, batch_size=1)

image = Image.fromarray(output[0].cpu().numpy())
image.save("superdiff_output.png")

Arguments that can be set by user in pipeline():

  • prompt_1 [required]: text prompt describing first concept to superimpose (e.g. "a flamingo")
  • prompt_2[required]: text prompt describing second concept to superimpose (e.g. "a candy cane")
  • seed[optional: default=None]: seed for random noise generator for reproducibility; for non-deterministic outputs, set to None
  • num_inference_steps[optional: default=1000]: number of denoising steps (we recommend 1000!)
  • batch_size [optional: default=1]: batch size
  • lift [optional: default=0.0]: bias value that favours generation towards one prompt over the other
  • guidance_scale [optional: default=7.5]: scale for classifier-free guidance
  • height, width [optional: default=512]: height and width of generated images

To replicate images from Section 4.2 of the paper, you can use the following:

image = pipeline(prompt_1, prompt_2, seed=1, num_inference_steps=1000, batch_size=20, lift=0.0, guidance_scale=7.5)

(Note: the runtime for a batch size of 1 on an NVIDIA A40 GPU is around 3 mins 30 sec.)

Citation

BibTeX:

@article{skreta2024superposition,
  title={The Superposition of Diffusion Models Using the It$\backslash$\^{} o Density Estimator},
  author={Skreta, Marta and Atanackovic, Lazar and Bose, Avishek Joey and Tong, Alexander and Neklyudov, Kirill},
  journal={arXiv preprint arXiv:2412.17762},
  year={2024}
}