Reasons for using two stage training, why first training with resolution of 384 instead of 1024?

#15
by zijun - opened

It seems that the training have two stage:
stage 1: 20,000 steps with resolution of 384
stage 2: 20,000 steps with resolution of 1024

What is the reason for using 384 resolution training in stage 1, why not just training with resolution of 1024 for 40,000 steps?
Is there any ablation study or experiment report show that two stage training is necessary?

No super rigorous reasons for doing the two stage training. We just found that 1024 helped with sample quality. iirc training on just 1024 was also sufficient

williamberman changed discussion status to closed

Sign up or log in to comment