ggml conversion error
This is awesome but the conversion to ggml for llama.cpp seems to be erroring out.
My command: python3 convert-gptq-to-ggml.py ../llama_models/vicuna-13b-GPTQ-4bit-128g/vicuna-13b-GPTQ-4bit-128g.pt ../llama_models/vicuna-13b-GPTQ-4bit-128g/tokenizer.model ggml-vicuna-13b-GPTQ-4bit-128g
The error:
Processing non-Q4 variable: model.embed_tokens.weight with shape: torch.Size([32001, 5120]) and type: torch.float16
Processing non-Q4 variable: model.norm.weight with shape: torch.Size([5120]) and type: torch.float16
Converting to float32
Processing non-Q4 variable: lm_head.weight with shape: torch.Size([32001, 5120]) and type: torch.float16
Traceback (most recent call last):
File "/home/sravanth/llama.cpp/convert-gptq-to-ggml.py", line 156, in <module>
convert_q4(f"model.layers.{i}.self_attn.q_proj", f"layers.{i}.attention.wq.weight", permute=True)
File "/home/sravanth/llama.cpp/convert-gptq-to-ggml.py", line 97, in convert_q4
zeros = model[f"{src_name}.zeros"].numpy()
KeyError: 'model.layers.0.self_attn.q_proj.zeros'
Any pointers on fixing it?
Use the safetensors, if possible.
Doesn't seem to support safetensors yet. But may be coming soon: https://github.com/ggerganov/llama.cpp/issues/688
Guess I'll wait.
Thanks
Is this: https://huggingface.co/eachadea/ggml-vicuna-13b-4bit
the ggml version of your repo by any chance. Doesn't say if gptq was used etc. So was kind of confused.
Is this: https://huggingface.co/eachadea/ggml-vicuna-13b-4bit
the ggml version of your repo by any chance. Doesn't say if gptq was used etc. So was kind of confused.
I converted into ggml using https://huggingface.co/eachadea/vicuna-13b (merged it from delta weights on my own system).
I converted into ggml using https://huggingface.co/eachadea/vicuna-13b (merged it from delta weights on my own system).
Does GPTQ was used in that conversion?
No, ggml is a separate format with its own quantization implementation - gptq is not and shouldn't be involved