|
# Dreambooth for the inpainting model |
|
|
|
This script was added by @thedarkzeno . |
|
|
|
Please note that this script is not actively maintained, you can open an issue and tag @thedarkzeno or @patil-suraj though. |
|
|
|
```bash |
|
export MODEL_NAME="runwayml/stable-diffusion-inpainting" |
|
export INSTANCE_DIR="path-to-instance-images" |
|
export OUTPUT_DIR="path-to-save-model" |
|
|
|
accelerate launch train_dreambooth_inpaint.py \ |
|
--pretrained_model_name_or_path=$MODEL_NAME \ |
|
--instance_data_dir=$INSTANCE_DIR \ |
|
--output_dir=$OUTPUT_DIR \ |
|
--instance_prompt="a photo of sks dog" \ |
|
--resolution=512 \ |
|
--train_batch_size=1 \ |
|
--gradient_accumulation_steps=1 \ |
|
--learning_rate=5e-6 \ |
|
--lr_scheduler="constant" \ |
|
--lr_warmup_steps=0 \ |
|
--max_train_steps=400 |
|
``` |
|
|
|
### Training with prior-preservation loss |
|
|
|
Prior-preservation is used to avoid overfitting and language-drift. Refer to the paper to learn more about it. For prior-preservation we first generate images using the model with a class prompt and then use those during training along with our data. |
|
According to the paper, it's recommended to generate `num_epochs * num_samples` images for prior-preservation. 200-300 works well for most cases. |
|
|
|
```bash |
|
export MODEL_NAME="runwayml/stable-diffusion-inpainting" |
|
export INSTANCE_DIR="path-to-instance-images" |
|
export CLASS_DIR="path-to-class-images" |
|
export OUTPUT_DIR="path-to-save-model" |
|
|
|
accelerate launch train_dreambooth_inpaint.py \ |
|
--pretrained_model_name_or_path=$MODEL_NAME \ |
|
--instance_data_dir=$INSTANCE_DIR \ |
|
--class_data_dir=$CLASS_DIR \ |
|
--output_dir=$OUTPUT_DIR \ |
|
--with_prior_preservation --prior_loss_weight=1.0 \ |
|
--instance_prompt="a photo of sks dog" \ |
|
--class_prompt="a photo of dog" \ |
|
--resolution=512 \ |
|
--train_batch_size=1 \ |
|
--gradient_accumulation_steps=1 \ |
|
--learning_rate=5e-6 \ |
|
--lr_scheduler="constant" \ |
|
--lr_warmup_steps=0 \ |
|
--num_class_images=200 \ |
|
--max_train_steps=800 |
|
``` |
|
|
|
|
|
### Training with gradient checkpointing and 8-bit optimizer: |
|
|
|
With the help of gradient checkpointing and the 8-bit optimizer from bitsandbytes it's possible to run train dreambooth on a 16GB GPU. |
|
|
|
To install `bitandbytes` please refer to this [readme](https://github.com/TimDettmers/bitsandbytes#requirements--installation). |
|
|
|
```bash |
|
export MODEL_NAME="runwayml/stable-diffusion-inpainting" |
|
export INSTANCE_DIR="path-to-instance-images" |
|
export CLASS_DIR="path-to-class-images" |
|
export OUTPUT_DIR="path-to-save-model" |
|
|
|
accelerate launch train_dreambooth_inpaint.py \ |
|
--pretrained_model_name_or_path=$MODEL_NAME \ |
|
--instance_data_dir=$INSTANCE_DIR \ |
|
--class_data_dir=$CLASS_DIR \ |
|
--output_dir=$OUTPUT_DIR \ |
|
--with_prior_preservation --prior_loss_weight=1.0 \ |
|
--instance_prompt="a photo of sks dog" \ |
|
--class_prompt="a photo of dog" \ |
|
--resolution=512 \ |
|
--train_batch_size=1 \ |
|
--gradient_accumulation_steps=2 --gradient_checkpointing \ |
|
--use_8bit_adam \ |
|
--learning_rate=5e-6 \ |
|
--lr_scheduler="constant" \ |
|
--lr_warmup_steps=0 \ |
|
--num_class_images=200 \ |
|
--max_train_steps=800 |
|
``` |
|
|
|
### Fine-tune text encoder with the UNet. |
|
|
|
The script also allows to fine-tune the `text_encoder` along with the `unet`. It's been observed experimentally that fine-tuning `text_encoder` gives much better results especially on faces. |
|
Pass the `--train_text_encoder` argument to the script to enable training `text_encoder`. |
|
|
|
___Note: Training text encoder requires more memory, with this option the training won't fit on 16GB GPU. It needs at least 24GB VRAM.___ |
|
|
|
```bash |
|
export MODEL_NAME="runwayml/stable-diffusion-inpainting" |
|
export INSTANCE_DIR="path-to-instance-images" |
|
export CLASS_DIR="path-to-class-images" |
|
export OUTPUT_DIR="path-to-save-model" |
|
|
|
accelerate launch train_dreambooth_inpaint.py \ |
|
--pretrained_model_name_or_path=$MODEL_NAME \ |
|
--train_text_encoder \ |
|
--instance_data_dir=$INSTANCE_DIR \ |
|
--class_data_dir=$CLASS_DIR \ |
|
--output_dir=$OUTPUT_DIR \ |
|
--with_prior_preservation --prior_loss_weight=1.0 \ |
|
--instance_prompt="a photo of sks dog" \ |
|
--class_prompt="a photo of dog" \ |
|
--resolution=512 \ |
|
--train_batch_size=1 \ |
|
--use_8bit_adam \ |
|
--gradient_checkpointing \ |
|
--learning_rate=2e-6 \ |
|
--lr_scheduler="constant" \ |
|
--lr_warmup_steps=0 \ |
|
--num_class_images=200 \ |
|
--max_train_steps=800 |
|
``` |
|
|