Spaces:
Runtime error
Getting this error when coping it to my space
Are you running it on GPU?
CPU
This is working for CPU, but the processing takes 15 minutes for Redshift Render
https://huggingface.co/spaces/Omnibus/finetuned_diffusion_cpu
@anzorq :
I had that exact same error when running a duplicate of your space on a T4.
Just copying & running on a CPU produces this totally unhelpful error :
You should be able to fix the layernormkernelimpl not implemented for 'half'
error by going to the following lines in your requirements.txt
.
torch
torchvision==0.13.1+cu113
Now, replace them with this :
torch==1.12.1+cu113
torchvision==0.13.1+cu113
That should fix that particular bug.
After I fixed that, however, my app broke on the following line :
pipe.enable_xformers_memory_efficient_attention()
I tried fixing this by taking the following lines :
if torch.cuda.is_available():
pipe = pipe.to("cuda")
pipe.enable_xformers_memory_efficient_attention()
And then I replaced them with this :
to_cuda(torch, pipe)
def to_cuda(torch, pipe):
try:
if torch.cuda.is_available():
pipe = pipe.to("cuda")
pipe.enable_xformers_memory_efficient_attention()
return True
except:
return False
After that, the app started succesfully. However, it was still quite unstable and kept throwing errors.
@Omnibus :
At least your version works.
Unfortunately it's also very slow when running on a T4.
It should be optimized, so it runs slow on a CPU and fast on a GPU.
I had found that switching all of the "torch_dtype=torch.float16" to "torch_dtype=torch.get_default_dtype()" allows the program to run on CPU, and removes the
" layernormkernelimpl not implemented for 'half' " error when running on CPU. Also, changing the requirements to download CPU compatible modules as you mentioned.
I sense that in order for the program to perform on both CPU and GPU a toggle like this might work:
"if device = GPU: torch_dtype=torch.float16 elif device = CPU: torch_dtype=torch.get_default_dtype()"
I had found that switching all of the "torch_dtype=torch.float16" to "torch_dtype=torch.get_default_dtype()" allows the program to run on CPU, and removes the
" layernormkernelimpl not implemented for 'half' " error when running on CPU. Also, changing the requirements to download CPU compatible modules as you mentioned.I sense that in order for the program to perform on both CPU and GPU a toggle like this might work:
"if device = GPU: torch_dtype=torch.float16 elif device = CPU: torch_dtype=torch.get_default_dtype()"
This demo was meant to be run on GPU only. If you want to run it on CPU follow the above instruction.
@anzorq :
Please read my previous comment.
As I already explained, I tried running it in a GPU-enabled environment (T4 small --- 4 vCPU / 15 GiB RAM / Nvidia T4) and I got the same error as the OP.
As I also already explained, I got rid of the layernormkernelimpl not implemented for 'half'
error by adding a torch version compatible with the torchvision version.
However, after that it produces a different error :
Traceback (most recent call last):
File "app.py", line 52, in <module>
pipe.enable_xformers_memory_efficient_attention()
File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py", line 870, in enable_xformers_memory_efficient_attention
self.set_use_memory_efficient_attention_xformers(True, attention_op)
File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py", line 895, in set_use_memory_efficient_attention_xformers
fn_recursive_set_mem_eff(module)
File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py", line 886, in fn_recursive_set_mem_eff
module.set_use_memory_efficient_attention_xformers(valid, attention_op)
File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 208, in set_use_memory_efficient_attention_xformers
fn_recursive_set_mem_eff(module)
File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 204, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 204, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 204, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 201, in fn_recursive_set_mem_eff
module.set_use_memory_efficient_attention_xformers(valid, attention_op)
File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/diffusers/models/attention.py", line 117, in set_use_memory_efficient_attention_xformers
raise e
File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/diffusers/models/attention.py", line 111, in set_use_memory_efficient_attention_xformers
_ = xformers.ops.memory_efficient_attention(
File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/xformers/ops/memory_efficient_attention.py", line 967, in memory_efficient_attention
return op.forward_no_grad(
File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/xformers/ops/memory_efficient_attention.py", line 343, in forward_no_grad
return cls.FORWARD_OPERATOR(
File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/torch/_ops.py", line 143, in __call__
return self._op(*args, **kwargs or {})
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
Wrapping a try ... except
around pipe.enable_xformers_memory_efficient_attention()
fixes that error as well, as I also already explained.
That's as far as I got fixing errors myself. Will continue testing / developing soon.
This space installs A10G-specific prebuilt xformers whl. To use it on T4 you need to either disable xformers or install xformers for T4.
This space installs A10G-specific prebuilt xformers whl. To use it on T4 you need to either disable xformers or install xformers for T4.
I kinda suspected that's why it broke on that line. I experienced a similar issue / the same issue with another app on Google Colab a while ago.
Either way, my fix will stop the app from breaking the moment the app is downgraded from A10G to T4.
This is especially important for people who - like myself - are running their environment on a community GPU grant, as the environment can be changed by Huggingface at any time without the author being aware of it.
If you don't care about this, that's fine, I guess. It's your app. But in that case you might want to make people aware your app is designed to work as-is on Huggingface on A10G environments only and will break on both T4 environments & CPU-only environments without making certain adjustments. This would save lots of people from headaches when trying to duplicate your space.
I had found that switching all of the "torch_dtype=torch.float16" to "torch_dtype=torch.get_default_dtype()" allows the program to run on CPU, and removes the
" layernormkernelimpl not implemented for 'half' " error when running on CPU. Also, changing the requirements to download CPU compatible modules as you mentioned.I sense that in order for the program to perform on both CPU and GPU a toggle like this might work:
"if device = GPU: torch_dtype=torch.float16 elif device = CPU: torch_dtype=torch.get_default_dtype()"
how should i do that last bit? the same way that i would switch all of the "torch_dtype=torch.float16" to "torch_dtype=torch.get_default_dtype()"?