Running on llama.cpp
When trying to run with llama.cpp
./llama.cpp/server --port 8002 --host 0.0.0.0 -m llama.cpp/models/Mistral-Nemo-Instruct-2407-Q5_K_M.gguf -c 128000
I got : error loading model: create_tensor: tensor 'blk.0.attn_q.weight' has wrong shape; expected 5120, 5120, got 5120, 4096, 1, 1
When trying to run with llama.cpp
./llama.cpp/server --port 8002 --host 0.0.0.0 -m llama.cpp/models/Mistral-Nemo-Instruct-2407-Q5_K_M.gguf -c 128000
I got : error loading model: create_tensor: tensor 'blk.0.attn_q.weight' has wrong shape; expected 5120, 5120, got 5120, 4096, 1, 1
llama.cpp not support this model yet.
They've just added support for the tokenizer a few hours ago, a few other things to go though.
In none of them. It's not merged yet.
https://github.com/ggerganov/llama.cpp/issues/8577
https://github.com/ggerganov/llama.cpp/pull/8579
The gguf models have already updated, which are based on llama.cpp b3438. If any further issue, please let us know. Thanks a lot!