File size: 3,337 Bytes
c411165
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88c8cc7
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
base_model: black-forest-labs/FLUX.1-dev
base_model_relation: merge
license: other
license_name: flux-1-dev-non-commercial-license
license_link: LICENSE.md
tags:
- text-to-image
- image-generation
- flux
- merge
---

![Main cover](./raw/img_main_cover.jpg)

# Do you feel like you have too much VRAM lately? Want to OOM on a 40GB A100? **This is the model for you!**

---

# **About**

This is a **17B self-merge** of the original 12B parameter [Flux.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) model.

Merging was done similarly to 70B->120B LLM merges, with the layers repeated and interwoven in groups.

```
Final model stats:
 p layers: [    32]
 s layers: [    44]
 n params: [17.17B]
```

## Training

Some post-merge training was done to try and reverse the extensive braindamage the model has suffered, but even after that this is mostly a **proof of concept** due to not having any hardware capable of properly training this. Still, I think it might be the first open source 17B image model that actually generates coherent images, even if it's just a self-merge.

You can see the text recovering with training. Leftmost image is step 0 base merge:

![Training](./raw/img_train_recovery.jpg)

---

# **Usage**

*Good luck.*

### Diffusers

Should work with inference pipeline, `from_single_file` seems to need the custom layer counts passed:

```
model = FluxTransformer2DModel.from_single_file("flux.1-heavy-17B.safetensors", num_layers=32, num_single_layers=44)
```

### Comfy

Just load it normally via the "Load Diffusion Model" node. You need like 80GBs of system RAM on windows for it to not swap to disk lol.

It requires about 35-40GBs of VRAM for inference, assuming you offloat the text encoder and unload it during VAE decoding. Partial offloading works if you have enough system RAM.

### Training

Seems to work out of the box with [ostris/ai-toolkit](https://github.com/ostris/ai-toolkit), at least it did when I pointed `config -> process -> model -> name_or_path` to it in a local folder.

---

# **Q&A**:

## Should I use this model?

No unless you want to brag about it or God forbid train it into something usable.

## Where is the merge script?

It's a mess of like 3-4 scripts and some questionable manual editing on some of the biases. You can replicate it if you put the layers after each other with some overlap similarly to [this](https://huggingface.co/alpindale/goliath-120b#merge-process), just leave the later single layers alone.

The merged (untrained) weights are in this repo in the raw folder. You can `from_single_file -> save pretrained` w/ FluxTransformer2DModel if you need those in the diffusers format.

## GGUF? FP8?

It's essential to experience this model in BF16 precision for the full experience of running out of every kind of resources at the same time while trying to run it.

(I'd put some up but I'm out out runpod credits again)

## Settings? LoRA compatibility?

Just use the same settings you'd use for regular flux. LoRAs do seem to have at least some effect, but the blocks don't line up so don't expect them to work amazingly.

## Does this generate coherent images?

Yes but text and general prompt adherence can be questionable. Example failure mode for text:

![Main cover](./raw/img_text_fail.jpg)

## Was the cover image cherrypicked?

Of course.