--- pipeline_tag: text-to-image license: other license_name: stable-cascade-nc-community license_link: LICENSE --- # SoteDiffusion Cascade Anime finetune of Stable Cascade. Currently is in very early state in training. No commercial use thanks to StabilityAI.
## Code Example ```shell pip install diffusers ``` ```python import torch from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline prompt = "newest, 1girl, solo, cat ears, looking at viewer, blush, light smile," negative_prompt = "very displeasing, worst quality, monochrome, sketch, fat, child," prior = StableCascadePriorPipeline.from_pretrained("Disty0/sote-diffusion-cascade_alpha0", torch_dtype=torch.float16) decoder = StableCascadeDecoderPipeline.from_pretrained("Disty0/sote-diffusion-cascade-decoder_alpha0", torch_dtype=torch.float16) prior.enable_model_cpu_offload() prior_output = prior( prompt=prompt, height=1024, width=1024, negative_prompt=negative_prompt, guidance_scale=7.0, num_images_per_prompt=1, num_inference_steps=40 ) decoder.enable_model_cpu_offload() decoder_output = decoder( image_embeddings=prior_output.image_embeddings, prompt=prompt, negative_prompt=negative_prompt, guidance_scale=1.5 output_type="pil", num_inference_steps=10 ).images[0] decoder_output.save("cascade.png") ``` ## Training Status: **Alpha0 Release**: This release resets the training and enables Text Encoder training. **GPU used for training**: 1x AMD RX 7900 XTX 24GB | dataset name | training done | remaining | |---|---|---| | **newest** | 000 | 230 | | **recent** | 000 | 206 | | **mid** | 000 | 201 | | **early** | 000 | 055 | | **oldest** | 000 | 016 | | **pixiv** | 000 | 074 | | **visual novel cg** | 000 | 070 | | **anime wallpaper** | 000 | 013 | | **Total** | 8 | 865 | **Note**: chunks starts from 0 and there are 8000 images per chunk ## Dataset: **GPU used for captioning**: 1x Intel ARC A770 16GB **Model used for captioning**: SmilingWolf/wd-swinv2-tagger-v3 **Command:** ``` python /mnt/DataSSD/AI/Apps/kohya_ss/sd-scripts/finetune/tag_images_by_wd14_tagger.py --model_dir "/mnt/DataSSD/AI/models/wd14_tagger_model" --repo_id "SmilingWolf/wd-swinv2-tagger-v3" --recursive --remove_underscore --use_rating_tags --character_tags_first --character_tag_expand --append_tags --onnx --caption_separator ", " --general_threshold 0.35 --character_threshold 0.50 --batch_size 4 --caption_extension ".txt" ./ ``` | dataset name | total images | total chunk | |---|---|---| | **newest** | 1.843.053 | 221 | | **recent** | 1.652.420 | 207 | | **mid** | 1.609.608 | 202 | | **early** | 442.368 | 056 | | **oldest** | 128.311 | 017 | | **pixiv** | 594.046 | 075 | | **visual novel cg** | 560.903 | 071 | | **anime wallpaper** | 106.882 | 014 | | **Total** | 6.937.591 | 873 | **Note**: Smallest size is 1280x600 | 768.000 pixels ## Tags: ``` aesthetic tags, quality tags, date tags, custom tags, rating tags, character tags, rest of the tags ``` ### Date: | tag | date | |---|---| | **newest** | 2022 to 2024 | | **recent** | 2019 to 2021 | | **mid** | 2015 to 2018 | | **early** | 2011 to 2014 | | **oldest** | 2005 to 2010 | ### Aesthetic Tags: **Model used**: shadowlilac/aesthetic-shadow-v2 | score greater than | tag | |---|---| | **0.90** | extremely aesthetic | | **0.80** | very aesthetic | | **0.70** | aesthetic | | **0.50** | slightly aesthetic | | **0.40** | not displeasing | | **0.30** | not aesthetic | | **0.20** | slightly displeasing | | **0.10** | displeasing | | **rest of them** | very displeasing | ### Quality Tags: **Model used**: https://huggingface.co/hakurei/waifu-diffusion-v1-4/blob/main/models/aes-B32-v0.pth | score greater than | tag | |---|---| | **0.980** | best quality | | **0.900** | high quality | | **0.750** | great quality | | **0.500** | medium quality | | **0.250** | normal quality | | **0.125** | bad quality | | **0.025** | low quality | | **rest of them** | worst quality | ## Rating Tags - general - sensitive - questionable - explicit ## Custom Tags: | dataset name | custom tag | |---|---| | **image boards** | date, | | **pixiv** | art by Display_Name, | | **visual novel cg** | Full_VN_Name (short_3_letter_name), visual novel cg, | | **anime wallpaper** | date, anime wallpaper, | ## Training Params: **Software used**: Kohya SD-Scripts with Stable Cascade branch **Base model**: Disty0/sote-diffusion-cascade_pre-alpha0 ### Command: ``` accelerate launch --mixed_precision fp16 --num_cpu_threads_per_process 1 stable_cascade_train_stage_c.py \ --mixed_precision fp16 \ --save_precision fp16 \ --full_fp16 \ --sdpa \ --gradient_checkpointing \ --train_text_encoder \ --resolution "1024,1024" \ --train_batch_size 2 \ --adaptive_loss_weight \ --learning_rate 4e-6 \ --lr_scheduler constant_with_warmup \ --lr_warmup_steps 100 \ --optimizer_type adafactor \ --optimizer_args "scale_parameter=False" "relative_step=False" "warmup_init=False" \ --max_grad_norm 0 \ --token_warmup_min 1 \ --token_warmup_step 0 \ --shuffle_caption \ --caption_dropout_rate 0 \ --caption_tag_dropout_rate 0 \ --caption_dropout_every_n_epochs 0 \ --dataset_repeats 1 \ --save_state \ --save_every_n_steps 2048 \ --sample_every_n_steps 512 \ --max_token_length 225 \ --max_train_epochs 1 \ --caption_extension ".txt" \ --max_data_loader_n_workers 2 \ --persistent_data_loader_workers \ --enable_bucket \ --min_bucket_reso 256 \ --max_bucket_reso 4096 \ --bucket_reso_steps 64 \ --bucket_no_upscale \ --log_with tensorboard \ --output_name sotediffusion-sc_3b \ --train_data_dir /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0000 \ --in_json /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0000.json \ --output_dir /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-0 \ --logging_dir /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-0/logs \ --resume /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-step00020480-state \ --stage_c_checkpoint_path /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-step00020480.safetensors \ --text_model_checkpoint_path /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-step00020480_text_model.safetensors \ --effnet_checkpoint_path /mnt/DataSSD/AI/models/sd-cascade/effnet_encoder.safetensors \ --previewer_checkpoint_path /mnt/DataSSD/AI/models/sd-cascade/previewer.safetensors \ --sample_prompts /mnt/DataSSD/AI/SoteDiffusion/StableCascade/config/sotediffusion-prompt.txt ``` ## Limitations and Bias ### Bias - This model is intended for anime illustrations. Realistic capabilites are not tested at all. - Still underbaked. ### Limitations - Can fall back to realistic. Add "realistic" tag to the negatives when this happens. - Far shot eyes are still bad thanks to the heavy latent compression.