metadata
pipeline_tag: text-to-image
license: other
license_name: stable-cascade-nc-community
license_link: LICENSE
SoteDiffusion Cascade
Anime finetune of Stable Cascade.
Currently is in very early state in training.
No commercial use thanks to StabilityAI.
Code Example
pip install diffusers
import torch
from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline
prompt = "(extremely aesthetic, best quality, newest), 1girl, solo, cat ears, looking at viewer, blush, light smile, upper body,"
negative_prompt = "very displeasing, worst quality, monochrome, sketch, blurry, fat, child,"
prior = StableCascadePriorPipeline.from_pretrained("Disty0/sote-diffusion-cascade_pre-alpha0", torch_dtype=torch.float16)
decoder = StableCascadeDecoderPipeline.from_pretrained("Disty0/sote-diffusion-cascade-decoder_pre-alpha0", torch_dtype=torch.float16)
prior.enable_model_cpu_offload()
prior_output = prior(
prompt=prompt,
height=1024,
width=1024,
negative_prompt=negative_prompt,
guidance_scale=6.0,
num_images_per_prompt=1,
num_inference_steps=40
)
decoder.enable_model_cpu_offload()
decoder_output = decoder(
image_embeddings=prior_output.image_embeddings,
prompt=prompt,
negative_prompt=negative_prompt,
guidance_scale=2.0,
output_type="pil",
num_inference_steps=10
).images[0]
decoder_output.save("cascade.png")
Training Status:
GPU used for training: 1x AMD RX 7900 XTX 24GB
dataset name | training done | remaining |
---|---|---|
newest | 002 | 218 |
late | 002 | 204 |
mid | 002 | 199 |
early | 002 | 053 |
oldest | 002 | 014 |
pixiv | 002 | 072 |
visual novel cg | 002 | 068 |
anime wallpaper | 002 | 011 |
Total | 24 | 839 |
Note: chunks starts from 0 and there are 8000 images per chunk
Dataset:
GPU used for captioning: 1x Intel ARC A770 16GB
Model used for captioning: SmilingWolf/wd-v1-4-convnextv2-tagger-v2
dataset name | total images | total chunk |
---|---|---|
newest | 1.766.335 | 221 |
late | 1.652.420 | 207 |
mid | 1.609.608 | 202 |
early | 442.368 | 056 |
oldest | 128.311 | 017 |
pixiv | 594.046 | 075 |
visual novel cg | 560.903 | 071 |
anime wallpaper | 106.882 | 014 |
Total | 6.860.873 | 863 |
Note: Smallest size is 1280x600 | 768.000 pixels
Tags:
aesthetic tags, quality tags, date tags, custom tags, rest of the tags
Date:
tag | date |
---|---|
newest | 2022 to 2024 |
late | 2019 to 2021 |
mid | 2015 to 2018 |
early | 2011 to 2014 |
oldest | 2005 to 2010 |
Aesthetic Tags:
Model used: shadowlilac/aesthetic-shadow
score greater than | tag |
---|---|
0.980 | extremely aesthetic |
0.900 | very aesthetic |
0.750 | aesthetic |
0.500 | slightly aesthetic |
0.350 | not displeasing |
0.250 | not aesthetic |
0.125 | slightly displeasing |
0.025 | displeasing |
rest of them | very displeasing |
Quality Tags:
Model used: https://huggingface.co/hakurei/waifu-diffusion-v1-4/blob/main/models/aes-B32-v0.pth
score greater than | tag |
---|---|
0.980 | best quality |
0.900 | high quality |
0.750 | great quality |
0.500 | medium quality |
0.250 | normal quality |
0.125 | bad quality |
0.025 | low quality |
rest of them | worst quality |
Custom Tags:
dataset name | custom tag |
---|---|
image boards | date, |
pixiv | art by Display_Name, |
visual novel cg | Full_VN_Name (short_3_letter_name), visual novel cg, |
anime wallpaper | date, anime wallpaper, |
Training Params:
Software used: Kohya SD-Scripts with Stable Cascade branch
Base model: KBlueLeaf/Stable-Cascade-FP16-fixed
Command:
accelerate launch --mixed_precision fp16 --num_cpu_threads_per_process 1 stable_cascade_train_stage_c.py \
--mixed_precision fp16 \
--save_precision fp16 \
--full_fp16 \
--sdpa \
--gradient_checkpointing \
--resolution "1024,1024" \
--train_batch_size 2 \
--gradient_accumulation_steps 32 \
--adaptive_loss_weight \
--learning_rate 4e-6 \
--lr_scheduler constant_with_warmup \
--lr_warmup_steps 100 \
--optimizer_type adafactor \
--optimizer_args "scale_parameter=False" "relative_step=False" "warmup_init=False" \
--max_grad_norm 0 \
--token_warmup_min 1 \
--token_warmup_step 0 \
--shuffle_caption \
--caption_dropout_rate 0 \
--caption_tag_dropout_rate 0 \
--caption_dropout_every_n_epochs 0 \
--dataset_repeats 1 \
--save_state \
--save_every_n_steps 128 \
--sample_every_n_steps 32 \
--max_token_length 225 \
--max_train_epochs 1 \
--caption_extension ".txt" \
--max_data_loader_n_workers 2 \
--persistent_data_loader_workers \
--enable_bucket \
--min_bucket_reso 256 \
--max_bucket_reso 4096 \
--bucket_reso_steps 64 \
--bucket_no_upscale \
--log_with tensorboard \
--output_name sotediffusion-sc_3b \
--train_data_dir /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0002 \
--in_json /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0002.json \
--output_dir /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-2 \
--logging_dir /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-2/logs \
--resume /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-1/sotediffusion-sc_3b-1-state \
--stage_c_checkpoint_path /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-1/sotediffusion-sc_3b-1.safetensors \
--effnet_checkpoint_path /mnt/DataSSD/AI/models/sd-cascade/effnet_encoder.safetensors \
--previewer_checkpoint_path /mnt/DataSSD/AI/models/sd-cascade/previewer.safetensors \
--sample_prompts /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-prompt.txt
Limitations and Bias
Bias
- This model is intended for anime illustrations.
Realistic capabilites are not tested at all. - Current version has bias to older anime styles.
Limitations
- Can fall back to realistic.
Use "anime illustration" tag to point it into the right direction. - Far shot eyes are bad thanks to the heavy latent compression.