Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,17 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
---
|
4 |
+
|
5 |
+
Nowadays, text-to-image (T2I) models are growing stronger but larger, which limits their practical applicability, especially on consumer-level devices.
|
6 |
+
To bridge this gap, we distilled the **12B** `Flux-dev` model into a **3.2B** `Flux-mini` model, trying to preserve its strong image generation capabilities.
|
7 |
+
Specifically, we prune the original `Flux-dev` by reducing its depth from `19 + 38` (number of double blocks and single blocks) to `5 + 10`.
|
8 |
+
The pruned model is further tuned with denoising and feature alignment objectives on a curated image-text dataset.
|
9 |
+
|
10 |
+
We empeirically found that different blocks has different impact on the generation quality, thus we initialize the student model with several most important blocks.
|
11 |
+
The distillation process consists of three objectives: the denoise loss, the output alignment loss and the feature alignment loss.
|
12 |
+
The feature aligement loss is designed in a way such that the output of `block-x` in the student model is encouraged to match that of `block-4x` in the teacher model.
|
13 |
+
The distillation process is performed with `512x512` laion images recaptioned with `Qwen-VL` in the first stage for `90k steps`,
|
14 |
+
and `1024x1024` images generated by `Flux` using the prompts in `JourneyDB` with another `90k steps`.
|
15 |
+
|
16 |
+
|
17 |
+
github link: https://github.com/TencentARC/flux-toolkits
|