Text Generation
Transformers
PyTorch
English
llama
causal-lm
text-generation-inference
Inference Endpoints

No answer, chat log blink and disappear

#8
by VSFletch3r - opened

Hey,

Im on a 2021 macbook pro m1 max w/ 64gb. IOS 13.3.1
I could luanch the webui normally, and load the model.

But whatever I type in, the chat log will simply blink and disappear.


Traceback (most recent call last):
File "/Users/x/Desktop/oobabooga_macos/text-generation-webui/modules/callbacks.py", line 73, in gentask
ret = self.mfunc(callback=_callback, **self.kwargs)
File "/Users/x/Desktop/oobabooga_macos/text-generation-webui/modules/text_generation.py", line 251, in generate_with_callback
shared.model.generate(**kwargs)
File "/Users/x/Desktop/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/x/Desktop/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/transformers/generation/utils.py", line 1485, in generate
return self.sample(
File "/Users/x/Desktop/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/transformers/generation/utils.py", line 2521, in sample
model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
File "/Users/x/Desktop/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 736, in prepare_inputs_for_generation
position_ids = attention_mask.long().cumsum(-1) - 1
RuntimeError: MPS does not support cumsum op with int64 input
Output generated in 0.19 seconds (0.00 tokens/s, 0 tokens, context 36, seed 639214276)


I have tried to "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu"
It seems not working to fix the problem.

Any idea who to fix this? I am very new to my macbook, might need help in details.

Thanks in advance.

Here's a thread describing the issue: https://github.com/pytorch/pytorch/issues/96610

You need to make sure you're running macOS 13.3. If you already are then it might be because torch is being installed in a different version of Python:

image.png

Hey,

I'm getting an identical error message trying to run CodeLlama 7B on an M1 Pro - I've updated MacOS to 13.5.2, and am running python 3.10.9 and pip for python 3.10.

Not sure if this is relevant, but I am also getting the following message when I load the model using the WebUI (https://github.com/oobabooga/text-generation-webui):

UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
'NoneType' object has no attribute 'cadam32bit_grad_fp32'

Appreciate any thoughts on what is going wrong here.

This appears to have been resolved elsewhere, requiring you to install PyTorch nightlies- not stable release.

https://github.com/pytorch/pytorch/issues/96610#issuecomment-1597314364

But having implemented the change my inference time is still unusably slow at 0.02 tokens/sec. Anyone know why that might be? Thanks in advance.

Sign up or log in to comment