stabilityai/stable-diffusion-3-medium · what is t5xxl models required for and what's the differences apart from the sizes? thx

tetsujin007

Jun 12

use of t5xxl models?

Ozks

Jun 12

I think is for text
https://www.youtube.com/watch?v=xMQT9o97shA
2:44 timeline

YaTharThShaRma999

Jun 12

@tetsujin007
you see diffusion models need something called text encoders to actually understand the text. more text encoders and larger ones seem to improve performance.

this repository provides 2 variants of the sd3 model. one is with 2 text encoders, and the other one is with 3(including t5xxl).
the one with 3 text encoders(including t5xxl) is slightly better in prompt following, putting text in images, and overall quality. The difference isn't much but there is a slight difference. For best performance, use the one with t5xxl. If you don't have enough VRAM, use the smaller one.