Can blip2 be run at half, or lower, precision on CPU?

#29

by jamprimoz - opened Mar 5, 2024

Mar 5, 2024

Hi, when I set from_pretrained's torch_dtype=torch.float16 I get the following error back:

RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'

I've seen some other instances where this error comes up when running on the CPU, which is what I'm doing in this case. Is there a way to run this model in lower precision on the CPU?

ybelkada

Mar 6, 2024

Hi @jamprimoz
Some float16 operations might not be supported out of the box on CPU indeed, can you try with bfloat16 instead?

jamprimoz

Mar 6, 2024

Not only did it run, but it went from taking 14 seconds a picture to 2.5.

Thank you so very much for your help!

jamprimoz changed discussion status to closed Mar 6, 2024

jamprimoz changed discussion status to open Mar 6, 2024

jamprimoz

Mar 6, 2024

•

edited Mar 15, 2024

And of course I think of the follow up right after closing the thread :-D

Will bfloat16 work when there is a GPU involved or should I use something like:

use_dtype= "torch.float16" if torch.cuda.is_available() else "torch.bfloat16"

to set my torch_dtype if I want this to run on GPU when its available?

EDIT: just grabbed the torch dytpes directly instead of as strings

device_dtype = torch.float16 if torch.cuda.is_available() else torch.bfloat16

ybelkada

Mar 15, 2024

Hi @jamprimoz !
I am not sure about that but i think float16 is faster than bfloat16 on GPU indeed, so you might consider that option as well

jamprimoz

Mar 15, 2024

Thanks again!

jamprimoz changed discussion status to closed Mar 15, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment