|
--- |
|
license: other |
|
license_name: flux-1-dev-non-commercial-license |
|
license_link: LICENSE.md |
|
pipeline_tag: text-to-image |
|
--- |
|
|
|
# Flux-Mini |
|
|
|
A 3.2B MMDiT distilled from Flux-dev for efficient text-to-image generation |
|
|
|
|
|
<div align="center"> |
|
<img src="flux_distill-flux-mini-teaser.jpg" width="800" alt="Teaser image"> |
|
</div> |
|
|
|
|
|
|
|
|
|
|
|
Nowadays, text-to-image (T2I) models are growing stronger but larger, which limits their practical applicability, especially on consumer-level devices. |
|
To bridge this gap, we distilled the **12B** `Flux-dev` model into a **3.2B** `Flux-mini` model, trying to preserve its strong image generation capabilities. |
|
Specifically, we prune the original `Flux-dev` by reducing its depth from `19 + 38` (number of double blocks and single blocks) to `5 + 10`. |
|
The pruned model is further tuned with denoising and feature alignment objectives on a curated image-text dataset. |
|
|
|
We empirically found that different blocks have different impacts on the generation quality, thus we initialize the student model with several most important blocks. |
|
The distillation process consists of three objectives: the denoise loss, the output alignment loss as well as the feature alignment loss. |
|
The feature alignment loss is designed in a way such that the output of `block-x` in the student model is encouraged to match that of `block-4x` in the teacher model. |
|
The distillation process is performed with `512x512` Laion images recaptioned with `Qwen-VL` in the first stage for `90k steps`, |
|
and `1024x1024` images generated by `Flux` using the prompts in `JourneyDB` with another `90k steps`. |
|
|
|
|
|
github link: https://github.com/TencentARC/flux-toolkits |