Text Generation
Transformers
Safetensors
Thai
English
llama
conversational
Eval Results
text-generation-inference
Inference Endpoints

Hi. Thanks for your great models. But you should upload added_tokens.json to all of them.

#7
by NikolayKozloff - opened

Online HF converter Gguf-my-repo raises an error when i try to convert your models to gguf: Error: Error converting to fp16: b'Traceback (most recent call last):\n File "/home/user/app/llama.cpp/convert.py", line 1548, in \n main()\n File "/home/user/app/llama.cpp/convert.py", line 1542, in main\n OutputFile.write_all(outfile, ftype, params, model, vocab, special_vocab,\n File "/home/user/app/llama.cpp/convert.py", line 1207, in write_all\n check_vocab_size(params, vocab, pad_vocab=pad_vocab)\n File "/home/user/app/llama.cpp/convert.py", line 1049, in check_vocab_size\n raise ValueError(msg)\nValueError: Vocab size mismatch (model has 57344, but SambaLingo-Hungarian-Base/tokenizer.model has 52603). Add the --pad-vocab option and try again.\n'

The same error occurs locally if i run convert.py from llama.cpp github repo on my laptop. --pad-vocab option without added_tokens.json spoils model by inserting tokens in wrong places.

SambaNova Systems org

Hi @NikolayKozloff so sorry for delayed response.

Could you share what happens when you use --pad-vocab and why it spoils the model? That seems strange to us.

Also how come you were able to create quantized Gguf versions of the other models without any issues?

Thank you for your feedback and we will try to fix this issue!

Sign up or log in to comment