TheBloke/guanaco-65B-GPTQ · error when uing the text-generation-webui api with the model

GPU: A40(48GB) * 1
CPU: 15 vCPU AMD EPYC 7543 32-Core Processor
MEM: 80GB

/root/text-generation-webui
bin /root/miniconda3/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda116.so
INFO:Loading guanaco-65B-GPTQ...
CUDA extension not installed.
INFO:Found the following quantized model: models/guanaco-65B-GPTQ/Guanaco-65B-GPTQ-4bit.act-order.safetensors
Traceback (most recent call last):
File "/root/text-generation-webui/server.py", line 1102, in
shared.model, shared.tokenizer = load_model(shared.model_name)
File "/root/text-generation-webui/modules/models.py", line 97, in load_model
output = load_func(model_name)
File "/root/text-generation-webui/modules/models.py", line 291, in GPTQ_loader
model = modules.GPTQ_loader.load_quantized(model_name)
File "/root/text-generation-webui/modules/GPTQ_loader.py", line 177, in load_quantized
model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
File "/root/text-generation-webui/modules/GPTQ_loader.py", line 84, in _load_quant
model.load_state_dict(safe_load(checkpoint), strict=False)
File "/root/miniconda3/lib/python3.10/site-packages/safetensors/torch.py", line 259, in load_file
with safe_open(filename, framework="pt", device=device) as f:
safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer