OFA-Sys
/

small-stable-diffusion-v0

StableDiffusionPipeline

stable-diffusion

stable-diffusion-diffusers

Inference Endpoints

Model card Files Files and versions Community

cq commited on Feb 7, 2023

Commit

1eb13a0

•

1 Parent(s): 63dbe71

diffusion-deploy

Files changed (1) hide show

README.md +5 -3

README.md CHANGED Viewed

@@ -14,8 +14,10 @@ pipeline_tag: text-to-image
 ---
 # Small Stable Diffusion Model Card
-Similar image generation quality, but nearly 1/2 smaller!
 Here are some samples:
 ![Samples](https://huggingface.co/OFA-Sys/small-stable-diffusion-v0/resolve/main/sample_images_compressed.jpg)
@@ -52,13 +54,13 @@ This model is initialized from stable-diffusion v1-4. As the model structure is
 ### Training Procedure
-After the initialization, the model have been trained for 1100k steps in 8xA100 GPUS. The training progress consists of three stages. The first stage is a simple pre-training precedure. In the last two stages, the original stable diffusion was utilized to distill knowledge to small model as a teacher model. In all stages, only the parameters in unet were trained and other parameters were frozen.
 - **Hardware:** 8 x A100-80GB GPUs
 - **Optimizer:** AdamW
-- **Stage 1** - Pretrain the unet part of model.
   - **Steps**: 500,000
   - **Batch:** batch size=8, GPUs=8, Gradient Accumulations=2. Total batch size=128
   - **Learning rate:** warmup to 1e-5 for 10,000 steps and then kept constant

 ---
 # Small Stable Diffusion Model Card
+【Update 2023/02/07】 Recently, we have released [a diffusion deployment repo](https://github.com/OFA-Sys/diffusion-deploy) to speedup the inference on both GPU (\~4x speedup, based on TensorRT) and CPU (\~12x speedup, based on IntelOpenVINO).
+Integrated with this repo, small-stable-diffusion could generate images in just **5 seconds on the CPU**.
+Similar image generation quality, but is nearly 1/2 smaller!
 Here are some samples:
 ![Samples](https://huggingface.co/OFA-Sys/small-stable-diffusion-v0/resolve/main/sample_images_compressed.jpg)
 ### Training Procedure
+After the initialization, the model has been trained for 1100k steps in 8xA100 GPUS. The training progress consists of three stages. The first stage is a simple pre-training precedure. In the last two stages, the original stable diffusion was utilized to distill knowledge to small model as a teacher model. In all stages, only the parameters in unet were trained and other parameters were frozen.
 - **Hardware:** 8 x A100-80GB GPUs
 - **Optimizer:** AdamW
+- **Stage 1** - Pretrain the unet part of the model.
   - **Steps**: 500,000
   - **Batch:** batch size=8, GPUs=8, Gradient Accumulations=2. Total batch size=128
   - **Learning rate:** warmup to 1e-5 for 10,000 steps and then kept constant