QuantFactory/Phi-3-mini-128k-instruct-GGUF · error when loading the model

May 24, 2024

Hi,

I am trying to load the model using llama.cpp but I am getting an error message:
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 197, got 195

Code:
from llama_cpp import Llama

llm_n_gpu_layers = -1
llm_split_mode = 0
llm_main_gpu = 0

llm = Llama(
model_path="./models/phi3-128k/Phi-3-mini-128k-instruct-Q4_K_M.gguf",
n_gpu_layers=llm_n_gpu_layers,
n_ctx=3072,
chat_format="phi-3-chat",
offload_kqv=True,
split_mode=llm_split_mode,
main_gpu=llm_main_gpu)

I can load and use the phi-3-mini-4k in the gguf format (https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf) but not the 128k version...

Any hint or advice would be very helpful.
Thanks

munish0838

Quant Factory org May 24, 2024

Looking into the issue

jkkphys

May 31, 2024

Any luck? I am running into the same issue, even when converting on my own.

munish0838

Quant Factory org Jun 1, 2024

@jkkphys Seems like some issue in llama.cpp, I tried recreating but still ran into same issue

neo0oen

Jun 2, 2024

•

edited Jun 2, 2024

lower gpu layers to zero it sorted me out :]
or gpu offlading however its called on your end over there XD

jkkphys

Jun 2, 2024

So you are saying that running via CPU works fine for you? I’ll give it a shot, but it kind of limits usefulness.