SoteDiffusion Cascade

Anime finetune of Stable Cascade Decoder.
No commercial use thanks to StabilityAI.

Code Example

pip install diffusers

import torch
from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline

prompt = "(extremely aesthetic, best quality, newest), 1girl, solo, cat ears, looking at viewer, blush, light smile, upper body,"
negative_prompt = "very displeasing, worst quality, monochrome, sketch, blurry, fat, child,"

prior = StableCascadePriorPipeline.from_pretrained("Disty0/sote-diffusion-cascade_pre-alpha0", torch_dtype=torch.float16)
decoder = StableCascadeDecoderPipeline.from_pretrained("Disty0/sote-diffusion-cascade-decoder_pre-alpha0", torch_dtype=torch.float16)

prior.enable_model_cpu_offload()
prior_output = prior(
    prompt=prompt,
    height=1024,
    width=1024,
    negative_prompt=negative_prompt,
    guidance_scale=6.0,
    num_images_per_prompt=1,
    num_inference_steps=40
)

decoder.enable_model_cpu_offload()
decoder_output = decoder(
    image_embeddings=prior_output.image_embeddings,
    prompt=prompt,
    negative_prompt=negative_prompt,
    guidance_scale=2.0,
    output_type="pil",
    num_inference_steps=10
).images[0]
decoder_output.save("cascade.png")

Dataset

Used the same dataset as SoteDiffusion-Cascade_pre-alpha0.
Selected images from newest dataset that got more than 0.98 score by both aesthetic and quality taggers.
Trained with 98K~ images.

Training:

GPU used for training: 1x AMD RX 7900 XTX 24GB

Software used: https://github.com/2kpr/StableCascade

Config:

experiment_id: sotediffusion-sc-b_3b
model_version: 3B
dtype: bfloat16
use_fsdp: False

batch_size: 64
grad_accum_steps: 64
updates: 3000
backup_every: 128
save_every: 32
warmup_updates: 100

lr: 4.0e-6
optimizer_type: Adafactor
adaptive_loss_weight: True
stochastic_rounding: True

image_size: 768
multi_aspect_ratio: [1/1, 1/2, 1/3, 2/3, 3/4, 1/5, 2/5, 3/5, 4/5, 1/6, 5/6, 9/16]
shift: 4

checkpoint_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/
output_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/
webdataset_path: file:/mnt/DataSSD/AI/anime_image_dataset/best/newest_best-{0000..0001}.tar

effnet_checkpoint_path: /mnt/DataSSD/AI/models/sd-cascade/effnet_encoder.safetensors
stage_a_checkpoint_path: /mnt/DataSSD/AI/models/sd-cascade/stage_a.safetensors
generator_checkpoint_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/stage_b-generator-049152.safetensors

Limitations and Bias

Bias

This model is intended for anime illustrations.
Realistic capabilites are not tested at all.

Limitations

Far shot eyes are bad thanks to the heavy latent compression.

Disty0
/

sote-diffusion-cascade-decoder_pre-alpha0