Can blip2 be run at half, or lower, precision on CPU?
Hi, when I set from_pretrained's torch_dtype=torch.float16 I get the following error back:
RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
I've seen some other instances where this error comes up when running on the CPU, which is what I'm doing in this case. Is there a way to run this model in lower precision on the CPU?
Hi
@jamprimoz
Some float16 operations might not be supported out of the box on CPU indeed, can you try with bfloat16
instead?
Not only did it run, but it went from taking 14 seconds a picture to 2.5.
Thank you so very much for your help!
And of course I think of the follow up right after closing the thread :-D
Will bfloat16 work when there is a GPU involved or should I use something like:
use_dtype= "torch.float16" if torch.cuda.is_available() else "torch.bfloat16"
to set my torch_dtype if I want this to run on GPU when its available?
EDIT: just grabbed the torch dytpes directly instead of as strings
device_dtype = torch.float16 if torch.cuda.is_available() else torch.bfloat16
Hi
@jamprimoz
!
I am not sure about that but i think float16 is faster than bfloat16 on GPU indeed, so you might consider that option as well
Thanks again!