mradermacher/DeepSeek-R1-Distill-Qwen-32B-Uncensored-GGUF · Which of llama.cpp version should I use

9 days ago

llama.cpp version:
commit 553f1e46e9e864514bbd6bf4009146db66be0541 (HEAD, tag: b4600, origin/master, origin/HEAD)
Author: Olivier Chafik ochafik@users.noreply.github.com
Date: Thu Jan 30 22:01:06 2025 +0000

This is the error log:
llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'deepseek-r1-qwen'
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: failed to load model '/home/xxx/Models/DeepSeek-R1-Distill-Qwen-32B-Uncensored-GGUF/DeepSeek-R1-D
istill-Qwen-32B-Uncensored.Q8_0.gguf'
main: error: unable to load model

mradermacher

Owner 9 days ago

the quants were done with b4526, and any later version should work. b4600 has support for that pretokenizer, so you are likely not using the version you think you are using, but an older version.

mradermacher

Owner 9 days ago

just tested it, loads and works fine with both b4526 and b4600

mradermacher changed discussion status to closed 9 days ago