No answer, chat log blink and disappear

by VSFletch3r - opened May 8, 2023

May 8, 2023

Hey,

Im on a 2021 macbook pro m1 max w/ 64gb. IOS 13.3.1
I could luanch the webui normally, and load the model.

But whatever I type in, the chat log will simply blink and disappear.

Traceback (most recent call last):
File "/Users/x/Desktop/oobabooga_macos/text-generation-webui/modules/callbacks.py", line 73, in gentask
ret = self.mfunc(callback=_callback, **self.kwargs)
File "/Users/x/Desktop/oobabooga_macos/text-generation-webui/modules/text_generation.py", line 251, in generate_with_callback
shared.model.generate(**kwargs)
File "/Users/x/Desktop/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/x/Desktop/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/transformers/generation/utils.py", line 1485, in generate
return self.sample(
File "/Users/x/Desktop/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/transformers/generation/utils.py", line 2521, in sample
model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
File "/Users/x/Desktop/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 736, in prepare_inputs_for_generation
position_ids = attention_mask.long().cumsum(-1) - 1
RuntimeError: MPS does not support cumsum op with int64 input
Output generated in 0.19 seconds (0.00 tokens/s, 0 tokens, context 36, seed 639214276)

I have tried to "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu"
It seems not working to fix the problem.

Any idea who to fix this? I am very new to my macbook, might need help in details.

Thanks in advance.

TheBloke

Owner May 8, 2023

Here's a thread describing the issue: https://github.com/pytorch/pytorch/issues/96610

You need to make sure you're running macOS 13.3. If you already are then it might be because torch is being installed in a different version of Python:

cfmbrand

Sep 10, 2023

•

edited Sep 10, 2023

Hey,

I'm getting an identical error message trying to run CodeLlama 7B on an M1 Pro - I've updated MacOS to 13.5.2, and am running python 3.10.9 and pip for python 3.10.

Not sure if this is relevant, but I am also getting the following message when I load the model using the WebUI (https://github.com/oobabooga/text-generation-webui):

UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
'NoneType' object has no attribute 'cadam32bit_grad_fp32'

Appreciate any thoughts on what is going wrong here.

cfmbrand

Sep 10, 2023

•

edited Sep 11, 2023

This appears to have been resolved elsewhere, requiring you to install PyTorch nightlies- not stable release.

https://github.com/pytorch/pytorch/issues/96610#issuecomment-1597314364

But having implemented the change my inference time is still unusably slow at 0.02 tokens/sec. Anyone know why that might be? Thanks in advance.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment