why dtype is bfloat16?

#61
by Macropodus - opened

Is there any considerations?

Hi @Macropodus
That is the dtype the model has been used for training. It also makes distribution much convenient as in fp32 the model weight files would take ~32GB instead of ~16

@ybelkada
Got it, but maybe the weight is not conducive to fine-tuning, most weights of llm is fp16.
always loss is nan while finetune a few step by loading weights of fp16. fp32 is ok.

Hi @Macropodus
Thanks for getting bach. For using the float16 model you can load it with revision="float16" in from_pretrained

Macropodus changed discussion status to closed

Sign up or log in to comment