diffusion-deploy
Browse files
README.md
CHANGED
@@ -14,8 +14,10 @@ pipeline_tag: text-to-image
|
|
14 |
---
|
15 |
|
16 |
# Small Stable Diffusion Model Card
|
|
|
|
|
17 |
|
18 |
-
Similar image generation quality, but nearly 1/2 smaller!
|
19 |
Here are some samples:
|
20 |
![Samples](https://huggingface.co/OFA-Sys/small-stable-diffusion-v0/resolve/main/sample_images_compressed.jpg)
|
21 |
|
@@ -52,13 +54,13 @@ This model is initialized from stable-diffusion v1-4. As the model structure is
|
|
52 |
|
53 |
### Training Procedure
|
54 |
|
55 |
-
After the initialization, the model
|
56 |
|
57 |
|
58 |
- **Hardware:** 8 x A100-80GB GPUs
|
59 |
- **Optimizer:** AdamW
|
60 |
|
61 |
-
- **Stage 1** - Pretrain the unet part of model.
|
62 |
- **Steps**: 500,000
|
63 |
- **Batch:** batch size=8, GPUs=8, Gradient Accumulations=2. Total batch size=128
|
64 |
- **Learning rate:** warmup to 1e-5 for 10,000 steps and then kept constant
|
|
|
14 |
---
|
15 |
|
16 |
# Small Stable Diffusion Model Card
|
17 |
+
【Update 2023/02/07】 Recently, we have released [a diffusion deployment repo](https://github.com/OFA-Sys/diffusion-deploy) to speedup the inference on both GPU (\~4x speedup, based on TensorRT) and CPU (\~12x speedup, based on IntelOpenVINO).
|
18 |
+
Integrated with this repo, small-stable-diffusion could generate images in just **5 seconds on the CPU**.
|
19 |
|
20 |
+
Similar image generation quality, but is nearly 1/2 smaller!
|
21 |
Here are some samples:
|
22 |
![Samples](https://huggingface.co/OFA-Sys/small-stable-diffusion-v0/resolve/main/sample_images_compressed.jpg)
|
23 |
|
|
|
54 |
|
55 |
### Training Procedure
|
56 |
|
57 |
+
After the initialization, the model has been trained for 1100k steps in 8xA100 GPUS. The training progress consists of three stages. The first stage is a simple pre-training precedure. In the last two stages, the original stable diffusion was utilized to distill knowledge to small model as a teacher model. In all stages, only the parameters in unet were trained and other parameters were frozen.
|
58 |
|
59 |
|
60 |
- **Hardware:** 8 x A100-80GB GPUs
|
61 |
- **Optimizer:** AdamW
|
62 |
|
63 |
+
- **Stage 1** - Pretrain the unet part of the model.
|
64 |
- **Steps**: 500,000
|
65 |
- **Batch:** batch size=8, GPUs=8, Gradient Accumulations=2. Total batch size=128
|
66 |
- **Learning rate:** warmup to 1e-5 for 10,000 steps and then kept constant
|