why dtype is bfloat16?

#61

by Macropodus - opened Feb 27

Discussion

Macropodus

Feb 27

Is there any considerations?

ybelkada

Feb 27

Hi @Macropodus
That is the dtype the model has been used for training. It also makes distribution much convenient as in fp32 the model weight files would take ~32GB instead of ~16

Macropodus

Feb 27

•

edited Feb 27

@ybelkada
Got it, but maybe the weight is not conducive to fine-tuning, most weights of llm is fp16.
always loss is nan while finetune a few step by loading weights of fp16. fp32 is ok.

ybelkada

Feb 27

Hi @Macropodus
Thanks for getting bach. For using the float16 model you can load it with revision="float16" in from_pretrained

Macropodus

Feb 27

got it

Macropodus changed discussion status to closed Feb 27

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment