metadata

base_model: black-forest-labs/FLUX.1-dev
base_model_relation: merge
license: other
license_name: flux-1-dev-non-commercial-license
license_link: LICENSE.md
tags:
  - text-to-image
  - image-generation
  - flux
  - merge

Do you feel like you have too much VRAM lately? Want to OOM on a 40GB A100? This is the model for you!

About

This is a 17B self-merge of the original 12B parameter Flux.1-dev model.

Merging was done similarly to 70B->120B LLM merges, with the layers repeated and interwoven in groups.

Final model stats:
 p layers: [    32]
 s layers: [    44]
 n params: [17.17B]

Training

Some post-merge training was done to try and reverse the extensive braindamage the model has suffered, but even after that this is mostly a proof of concept due to not having any hardware capable of properly training this. Still, I think it might be the first open source 17B image model that actually generates coherent images, even if it's just a self-merge.

You can see the text recovering with training. Leftmost image is step 0 base merge:

Usage

Good luck.

Diffusers

Should work with inference pipeline, from_single_file seems to need the custom layer counts passed:

model = FluxTransformer2DModel.from_single_file("flux.1-heavy-17B.safetensors", num_layers=32, num_single_layers=44)

Comfy

Just load it normally via the "Load Diffusion Model" node. You need like 80GBs of system RAM on windows for it to not swap to disk lol.

It requires about 35-40GBs of VRAM for inference, assuming you offloat the text encoder and unload it during VAE decoding. Partial offloading works if you have enough system RAM.

Training

Seems to work out of the box with ostris/ai-toolkit, at least it did when I pointed config -> process -> model -> name_or_path to it in a local folder.

Q&A:

Should I use this model?

No unless you want to brag about it or God forbid train it into something usable.

Where is the merge script?

It's a mess of like 3-4 scripts and some questionable manual editing on some of the biases. You can replicate it if you put the layers after each other with some overlap similarly to this, just leave the later single layers alone.

The merged (untrained) weights are in this repo in the raw folder. You can from_single_file -> save pretrained w/ FluxTransformer2DModel if you need those in the diffusers format.

GGUF? FP8?

It's essential to experience this model in BF16 precision for the full experience of running out of every kind of resources at the same time while trying to run it.

(I'd put some up but I'm out out runpod credits again)

Settings? LoRA compatibility?

Just use the same settings you'd use for regular flux. LoRAs do seem to have at least some effect, but the blocks don't line up so don't expect them to work amazingly.

Does this generate coherent images?

Yes but text and general prompt adherence can be questionable. Example failure mode for text:

Was the cover image cherrypicked?

Of course.