error when loading the model
Hi,
I am trying to load the model using llama.cpp but I am getting an error message:
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 197, got 195
Code:
from llama_cpp import Llama
llm_n_gpu_layers = -1
llm_split_mode = 0
llm_main_gpu = 0
llm = Llama(
model_path="./models/phi3-128k/Phi-3-mini-128k-instruct-Q4_K_M.gguf",
n_gpu_layers=llm_n_gpu_layers,
n_ctx=3072,
chat_format="phi-3-chat",
offload_kqv=True,
split_mode=llm_split_mode,
main_gpu=llm_main_gpu)
I can load and use the phi-3-mini-4k in the gguf format (https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf) but not the 128k version...
Any hint or advice would be very helpful.
Thanks
Looking into the issue
Any luck? I am running into the same issue, even when converting on my own.
@jkkphys Seems like some issue in llama.cpp, I tried recreating but still ran into same issue
lower gpu layers to zero it sorted me out :]
or gpu offlading however its called on your end over there XD
So you are saying that running via CPU works fine for you? I’ll give it a shot, but it kind of limits usefulness.