Is it possible to create a smaller version like 8bit or fp16?

by BriggoBoy - opened Jul 28, 2024

Discussion

BriggoBoy

Jul 28, 2024

Would be great.

sanguivore

Jul 29, 2024

Assuming this has an architecture like SD3, it would also be nice to have a version separate from the text encoders, so that files for finetunes and so on can be much smaller.

colinw2292

Jul 29, 2024

•

edited Jul 29, 2024

I third these statements. Its a super good model it really follows the prompt exactly BUT man is it large lol.

YaTharThShaRma999

Jul 30, 2024

@BriggoBoy @sanguivore @colinw2292
diffusers supports quantization now, https://huggingface.co/blog/quanto-diffusers

Memory savings are very impressive. For example
PixartSigma1024x1024(a dit model similar to auraflow) usually takes 12gb vram, however with 8bit quantization of the transformer model and text encoder, it just takes 5gb vram! The above also supports Auraflow, SD3, and a few others.

RustyRuins

Jul 31, 2024

Even just splitting the model into its parts might help as SD3 is similar in size but by far smoother and faster running. Could have to do with specifics I am not familiar with of course, but still. Every image I generate, I feel like my gpu is producing x-rays

YaTharThShaRma999

Jul 31, 2024

@RustyRuins The actual image generation part of sd3 is just 2b. Its actually smaller then sdxl. The reason why sd3 seems so big is because its t5 text encoder. However since you only need to use the t5 text encoder once to generate a image and most of the work is done by the image generation part, it is pretty fast and faster then sdxl.

On the other hand, the actual image generation part of AuraFlow is roughly 6b. Auraflow's t5 text encoder is a lot smaller then sd3's. Since most of the work is done by the image generation part, its a lot slower then sd3 and sdxl.

I would highly recommend doing torch compile with AuraFlow and sd3 since that will boost speed by a whopping 4x! You could maybe, also use quantization as I showed above. To even save more memory, I believe you can use taesdxl vae(https://huggingface.co/madebyollin/taesdxl) which should save like 2-3gb vram. Maybe with everything combined, you could probably run AuraFlow with 8gb vram(maybe).

RustyRuins

Jul 31, 2024

@YaTharThShaRma999 Thank you for the explanation, it makes sense now.

I am running everything with ComfyUI, so I am not sure about the torch compile part, or what it actually entails.
But I did try taesdxl and it did not work right, resulted in heavy artifacting. This vae (https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) however works perfectly and also claims to be a lot more resource-efficient.

BriggoBoy

Jul 31, 2024

@RustyRuins Could you share your workflow?

RustyRuins

Jul 31, 2024

@BriggoBoy Sure thing. Do you mind custom node packs? The workflows I usually build and publish use a fair few, mostly popular ones most people have already anyway, but also some I find interesting and want to support by putting them in.
Or would rather want a bare-bones version with default nodes as far as possible?
The workflow is not quite done yet in full as I barely got started.

BriggoBoy

Jul 31, 2024

@RustyRuins I do not mind custom node packs, and take your time. Thanks in advance!

RustyRuins

Aug 1, 2024

@BriggoBoy And here it is : https://civitai.com/models/616471/comfyui-auraflow-show?modelVersionId=689134

BriggoBoy

Aug 2, 2024

@RustyRuins thank you very much!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment