Error while trying to load this model
Traceback (most recent call last): File “C:\Users\zolte\Downloads\oobabooga_windows\oobabooga_windows\text-generation-webui\server.py”, line 69, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name) File “C:\Users\zolte\Downloads\oobabooga_windows\oobabooga_windows\text-generation-webui\modules\models.py”, line 94, in load_model output = load_func(model_name) File “C:\Users\zolte\Downloads\oobabooga_windows\oobabooga_windows\text-generation-webui\modules\models.py”, line 296, in AutoGPTQ_loader return modules.AutoGPTQ_loader.load_quantized(model_name) File “C:\Users\zolte\Downloads\oobabooga_windows\oobabooga_windows\text-generation-webui\modules\AutoGPTQ_loader.py”, line 53, in load_quantized model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params) File “C:\Users\zolte\Downloads\oobabooga_windows\oobabooga_windows\installer_files\env\lib\site-packages\auto_gptq\modeling\auto.py”, line 83, in from_quantized return quant_func( File “C:\Users\zolte\Downloads\oobabooga_windows\oobabooga_windows\installer_files\env\lib\site-packages\auto_gptq\modeling_base.py”, line 749, in from_quantized make_quant( File “C:\Users\zolte\Downloads\oobabooga_windows\oobabooga_windows\installer_files\env\lib\site-packages\auto_gptq\modeling_utils.py”, line 92, in make_quant make_quant( File “C:\Users\zolte\Downloads\oobabooga_windows\oobabooga_windows\installer_files\env\lib\site-packages\auto_gptq\modeling_utils.py”, line 92, in make_quant make_quant( File “C:\Users\zolte\Downloads\oobabooga_windows\oobabooga_windows\installer_files\env\lib\site-packages\auto_gptq\modeling_utils.py”, line 92, in make_quant make_quant( [Previous line repeated 1 more time] File “C:\Users\zolte\Downloads\oobabooga_windows\oobabooga_windows\installer_files\env\lib\site-packages\auto_gptq\modeling_utils.py”, line 84, in make_quant new_layer = QuantLinear( File “C:\Users\zolte\Downloads\oobabooga_windows\oobabooga_windows\installer_files\env\lib\site-packages\auto_gptq\nn_modules\qlinear\qlinear_cuda_old.py”, line 83, in init self.autogptq_cuda = autogptq_cuda_256 NameError: name ‘autogptq_cuda_256’ is not defined
same here...im trying to run "TheBloke/WizardLM-Uncensored-Falcon-7B-GPTQ"
edit:
Same here... I'm trying to run "TheBloke/WizardLM-Uncensored-Falcon-7B-GPTQ"
edit:
I can load the model setting this
model = AutoGPTQForCausalLM.from_quantized(quantized_model_dir, device="cuda:0", use_triton=True, use_safetensors=True, torch_dtype=torch.float32, trust_remote_code=True)
so for me wasuse_triton=True
for me settging use_triton=True does not solve it , any ideas?
Hello!
I'm experiencing problem to load the model.
The command is executed but the load stops after a few seconds and the Killed message is retuned.
See bellow:
~/text-generation-webui$ python3 server.py --loader autogptq --gpu-memory 5000MiB --model TheBloke_vicuna-13B-1.1-GPTQ-4bit-128g
2023-07-23 19:42:13 INFO:Loading TheBloke_vicuna-13B-1.1-GPTQ-4bit-128g...
2023-07-23 19:42:13 INFO:The AutoGPTQ params are: {'model_basename': 'vicuna-13B-1.1-GPTQ-4bit-128g.compat.no-act-order', 'device': 'cuda:0', 'use_triton': False, 'inject_fused_attention': True, 'inject_fused_mlp': True, 'use_safetensors': False, 'trust_remote_code': False, 'max_memory': {0: '5000MiB', 'cpu': '99GiB'}, 'quantize_config': None, 'use_cuda_fp16': True}
Killed
Has anyone else gone through this? Any suggestion?
Thx!
"Killed" means you ran out of RAM. GPTQ models need to load in to RAM first, and then they move to GPU. So you need to run on a system with more available RAM, or add some swap space if you're able to.
I've increased the SWAP file size that solved the RAM issue, now the restriction is my GPU GTX 1660 S that has only 6GB of DDR. I will try a smaller model. Thx!