Is it possible to share training skills or training parameters?
Hello @Midu ,Thank you for your sharing. Is the finetune Chinese-style script based on the "diffusers" script? Is it possible to share training skills or training parameters?
Hi
@nipi
,
Yes, my script is based on diffusers for a good Deepspeed support. I did not tune the hyperparameters a lot but only use a low learning rate (6e-5 for 256 batch size).
Hello @Midu ,Thank you for your sharing. Is the finetune Chinese-style script based on the "diffusers" script? Is it possible to share training skills or training parameters?
Hi @nipi ,
Yes, my script is based on diffusers for a good Deepspeed support. I did not tune the hyperparameters a lot but only use a low learning rate (6e-5 for 256 batch size).
Hello
@Midu
,
Thank you for responding. The scheduler in the model's scheduler config.json uses EulerDiscreteScheduler, and the prediction type is v _prediction, but the diffuser's EulerDiscreteScheduler does not implement the get_velocity method. Do you make use of DDPMScheduler finetune? Furthermore, do you freeze the gradients of text encoder and vae during the training process, only finetuning the unet model?
Hello @Midu ,Thank you for your sharing. Is the finetune Chinese-style script based on the "diffusers" script? Is it possible to share training skills or training parameters?
Hi @nipi ,
Yes, my script is based on diffusers for a good Deepspeed support. I did not tune the hyperparameters a lot but only use a low learning rate (6e-5 for 256 batch size).Hello @Midu ,
Thank you for responding. The scheduler in the model's scheduler config.json uses EulerDiscreteScheduler, and the prediction type is v _prediction, but the diffuser's EulerDiscreteScheduler does not implement the get_velocity method. Do you make use of DDPMScheduler finetune? Furthermore, do you freeze the gradients of text encoder and vae during the training process, only finetuning the unet model?
Hi
@Midu
,
I can normally execute scripts with a batch size of 256 using the zero stage 3&pytorch lightning package, but I'm not sure how pytorch lightning uses the ema strategy under multiple gpus, and there's a dimension issue when directly modifying the diffusers text_to_image script. Could you please share the finetune code? If that is not possible, please send me a private message (my email address is by_nipi@163.com).
Hello @Midu ,Thank you for your sharing. Is the finetune Chinese-style script based on the "diffusers" script? Is it possible to share training skills or training parameters?
Hi @nipi ,
Yes, my script is based on diffusers for a good Deepspeed support. I did not tune the hyperparameters a lot but only use a low learning rate (6e-5 for 256 batch size).Hello @Midu ,
Thank you for responding. The scheduler in the model's scheduler config.json uses EulerDiscreteScheduler, and the prediction type is v _prediction, but the diffuser's EulerDiscreteScheduler does not implement the get_velocity method. Do you make use of DDPMScheduler finetune? Furthermore, do you freeze the gradients of text encoder and vae during the training process, only finetuning the unet model?
When finetuning, use DDPMScheduler to schedule noise and use get_velocity function, and use EulerDiscreteScheduler for sampling.