Vocab size mismatch with ggml
#9
by
0xK1ller
- opened
Trying to convert the model to ggml results in the following exception:
File "D:\Games\huggingface\llama.cpp\convert.py", line 1149, in <module>
main()
File "D:\Games\huggingface\llama.cpp\convert.py", line 1144, in main
OutputFile.write_all(outfile, params, model, vocab)
File "D:\Games\huggingface\llama.cpp\convert.py", line 942, in write_all
check_vocab_size(params, vocab)
File "D:\Games\huggingface\llama.cpp\convert.py", line 896, in check_vocab_size
raise Exception(msg)
Exception: Vocab size mismatch (model has 32001, but D:\Games\huggingface\huggingface\models\vicuna-7b\tokenizer.model has 32000). Most likely you are missing added_tokens.json (should be in D:\Games\huggingface\huggingface\models\vicuna-7b).```
I verified all checksums and also I'm using the latest commit of llama.cpp. Does someone know how to resolve the issue?
Edit: Tried it on google colab in case there was something wrong with the environment but I get the same error.