training an e-diffi reproduction with ssd-1b

#40
by bghira - opened

hello, i added support for ssd-1b into simpletuner and then added additional functionality to split the training between two models that use separate timestep ranges.

during that time i attempted to re-parameterise the weights of ssd-1b into v-prediction using a zero-terminal snr noise schedule.

it worked within 800 steps or so, which i didn't expect. this model trains very quickly. the last time i attempted a model reparameterisation it was SDXL 0.9 and it would always run into a mode collapse.

the lightweight nature of this model is greatly appreciated, an A6000 can meaningfully train this, instead of having to rent or purchase A100.

some examples of the Segmind e-Diffi combined with my foundational model "Terminus" as the stage 1. these weights are still undertrained.

image.png

image.png

image.png

an example of how to reproduce the results is here: https://github.com/bghira/SimpleTuner/blob/main/documentation/MIXTURE_OF_EXPERTS.md

Segmind org

Great work!

Sign up or log in to comment