license: other
license_name: flux-1-dev-non-commercial-license
license_link: LICENSE.md
pipeline_tag: text-to-image
Flux-Mini
A 3.2B MMDiT distilled from Flux-dev for efficient text-to-image generation
Nowadays, text-to-image (T2I) models are growing stronger but larger, which limits their practical applicability, especially on consumer-level devices.
To bridge this gap, we distilled the 12B Flux-dev
model into a 3.2B Flux-mini
model, trying to preserve its strong image generation capabilities.
Specifically, we prune the original Flux-dev
by reducing its depth from 19 + 38
(number of double blocks and single blocks) to 5 + 10
.
The pruned model is further tuned with denoising and feature alignment objectives on a curated image-text dataset.
We empirically found that different blocks have different impacts on the generation quality, thus we initialize the student model with several most important blocks.
The distillation process consists of three objectives: the denoise loss, the output alignment loss as well as the feature alignment loss.
The feature alignment loss is designed in a way such that the output of block-x
in the student model is encouraged to match that of block-4x
in the teacher model.
The distillation process is performed with 512x512
Laion images recaptioned with Qwen-VL
in the first stage for 90k steps
,
and 1024x1024
images generated by Flux
using the prompts in JourneyDB
with another 90k steps
.
github link: https://github.com/TencentARC/flux-toolkits