Any working tag/release/hash of AutoGPTQ?
Hi TheBlocke!
I'm trying the python code from the model card with latest AutoGPTQ on an A100-40G, the model loads but I get a failure during inference:
Loading tokenizer...
Loading model...
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
The safetensors archive passed at /model/gptq_model-4bit--1g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
can't get model's sequence length from model config, will set to 4096.
RWGPTQForCausalLM hasn't fused attention module yet, will skip inject fused attention.
RWGPTQForCausalLM hasn't fused mlp module yet, will skip inject fused mlp.
Model loaded in 196.37s
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Traceback (most recent call last):
File "/pkg/modal/_container_entrypoint.py", line 330, in handle_input_exception
yield
File "/pkg/modal/_container_entrypoint.py", line 403, in call_function_sync
res = fun(*args, **kwargs)
File "/root/gptqfalcon.py", line 72, in generate
output = self.model.generate(input_ids=tokens, max_new_tokens=100, do_sample=True, temperature=0.8)
File "/repositories/AutoGPTQ/auto_gptq/modeling/_base.py", line 426, in generate
return self.model.generate(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1565, in generate
return self.sample(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2612, in sample
outputs = self(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/model/modelling_RW.py", line 759, in forward
transformer_outputs = self.transformer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/model/modelling_RW.py", line 654, in forward
outputs = block(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/model/modelling_RW.py", line 396, in forward
attn_outputs = self.self_attention(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/model/modelling_RW.py", line 252, in forward
fused_qkv = self.query_key_value(hidden_states) # [batch_size, seq_length, 3 x hidden_size]
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/repositories/AutoGPTQ/auto_gptq/nn_modules/qlinear_old.py", line 189, in forward
autogptq_cuda.vecquant4matmul_faster_old(x, self.qweight, out, self.scales.float(), self.qzeros, self.group_size, self.half_indim)
AttributeError: module 'autogptq_cuda' has no attribute 'vecquant4matmul_faster_old'
Any specific version of AutoGPTQ that's known to be compatible with this model? I really want to give it a try!
Can you confirm you built the latest version from source with these commands?
git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
pip install .
Yes, also installing the extra dependency as noted:
"git clone https://github.com/PanQiWei/AutoGPTQ /repositories/AutoGPTQ",
"cd /repositories/AutoGPTQ && pip install . && pip install einops",
I think I have a hunch as to what's wrong. For the scripts that I have working AutoGPTQ, there's a "python setup.py install" step.. I'm going to try adding that, as I think its what actually compiles the CUDA.
pip install should do that, but yeah you can run it by hand as well if you want.
Maybe first try:
pip uninstall auto-gptq
pip install .
FYI PanQiWei just PR'd code that will soon provide pre-compiled binary wheels for AutoGPTQ, so soon it won't be necessary to compile from source.
Success! Adding && python setup.py install
built the cuda module and fixed the crash and I get a responce!
### Instruction: write a story about llamas
### Response:
A group of llamas were out exploring the countryside when they stumbled upon an old, forgotten temple. As they walked through the entrance, they were taken aback by the intricate carvings and the sheer size of the temple. They stayed for a while, marveling at the magnificent architecture and absorbing the peaceful energy it exuded. After awhile, they decided to continue their journey, inspired by the temple's beauty and wisdom.<|endoftext|>-1:<|endoftext|>In the distance, the temple glowed with
I'll look into whats up with <|endoftext|>
and open another issue if problem is not in my code.
Thanks!