kawaimasa's picture
Update README.md
1b4e8ad verified
|
raw
history blame
3.23 kB
metadata
license: other
license_name: fair-ai-public-license-1.0-sd
license_link: https://freedevproject.org/faipl-1.0-sd/
language:
  - en
base_model:
  - Laxhar/noobai-XL-1.0
pipeline_tag: text-to-image
library_name: diffusers
tags:
  - safetensors
  - diffusers
  - stable-diffusion
  - stable-diffusion-xl
  - art

V-Prediction Loss Weighting Test

Notice

This repository contains personal experimental records. No guarantees are made regarding accuracy or reproducibility.

Overview

This repository is a test project comparing different loss weighting schemes for Stable Diffusion v-prediction training.

Environment

  • sd-scripts dev branch
    • Commit hash: [6adb69b] + Modified

Test Cases

This repository includes test models using different weighting schemes:

  1. test_normal_weight

    • Baseline model using standard weighting
  2. test_edm2_weighting

    • New loss weighting scheme
    • implementation by A
  3. test_min_snr_1(incomplete)

    • Baseline model with --min_snr_gamma = 1
  4. test_debias_scale-like(incomplete)

    • Baseline model with additional parameters:
      • --debiased_estimation_loss
      • --scale_v_pred_loss_like_noise_pred
  5. test_edm2_weight_new(incomplete)

    • New loss weighting scheme
    • Implementation by madman404

Training Parameters

For detailed parameters, please refer to the .toml files in each model directory. Each model uses sdxl_train.py in each model directory (and sdxl_train.py and t.py for test_edm2_weighting, sdxl_train.py andlossweightMLP.py for test_edm2_weight_new)

Common parameters:

  • Samples: 57,373
  • Epochs: 3
  • U-Net only
  • Learning rate: 3.5e-6
  • Batch size: 8
  • Gradient accumulation steps: 4
  • Optimizer: Adafactor (stochastic rounding)
  • Training time: 13.5 GPU hours (RTX4090) per trial

Dataset Information

The dataset used for testing consists of:

  • ~53,000 images extracted from danbooru2023 based on specific artist styles (approximately 300 artists)
  • ~4,000 carefully selected danbooru images for standardization

Note: As this dataset is a subset of my regular training data focused on specific artists, the model's generalization might be limited. A wildcard file (wildcard_style.txt) containing the list of included artists is provided for reference.

Tag Format

The training follows the tag format from Kohaku-XL-Epsilon: <1girl/1boy/1other/...>, <character>, <series>, <artists>, <general tags>, <quality tags>, <year tags>, <meta tags>, <rating tags>

Style Prompts

The following style prompts from Kohaku-XL-Epsilon might be compatible (untested):

ask \(askzy\), torino aqua, migolu, (jiu ye sang:1.1), (rumoon:0.9), (mizumi zumi:1.1)
ciloranko, maccha \(mochancc\), lobelia \(saclia\), migolu, 
ask \(askzy\), wanke, (jiu ye sang:1.1), (rumoon:0.9), (mizumi zumi:1.1)
shiro9jira, ciloranko, ask \(askzy\), (tianliang duohe fangdongye:0.8)
(azuuru:1.1), (torino aqua:1.2), (azuuru:1.1), kedama milk, 
fuzichoco, ask \(askzy\), chen bin, atdan, hito, mignon
ask \(askzy\), torino aqua, migolu

This model card was written with the assistance of Claude 3.5 Sonnet.