Can blip2 be run at half, or lower, precision on CPU?

#29
by jamprimoz - opened

Hi, when I set from_pretrained's torch_dtype=torch.float16 I get the following error back:

RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'

I've seen some other instances where this error comes up when running on the CPU, which is what I'm doing in this case. Is there a way to run this model in lower precision on the CPU?

Hi @jamprimoz
Some float16 operations might not be supported out of the box on CPU indeed, can you try with bfloat16 instead?

Not only did it run, but it went from taking 14 seconds a picture to 2.5.

Thank you so very much for your help!

jamprimoz changed discussion status to closed
jamprimoz changed discussion status to open

And of course I think of the follow up right after closing the thread :-D

Will bfloat16 work when there is a GPU involved or should I use something like:

use_dtype= "torch.float16" if torch.cuda.is_available() else "torch.bfloat16"

to set my torch_dtype if I want this to run on GPU when its available?

EDIT: just grabbed the torch dytpes directly instead of as strings

device_dtype = torch.float16 if torch.cuda.is_available() else torch.bfloat16

Hi @jamprimoz !
I am not sure about that but i think float16 is faster than bfloat16 on GPU indeed, so you might consider that option as well

Thanks again!

jamprimoz changed discussion status to closed

Sign up or log in to comment