No tokenizer.model???

Traceback (most recent call last):
  File "/exllamav2/convert.py", line 69, in <module>
    tokenizer = ExLlamaV2Tokenizer(config)
  File "/exllamav2/exllamav2/tokenizer.py", line 65, in __init__
    if os.path.exists(path_spm) and not force_json: self.tokenizer = ExLlamaV2TokenizerSPM(path_spm)
  File "/exllamav2/exllamav2/tokenizers/spm.py", line 9, in __init__
    self.spm = SentencePieceProcessor(model_file = tokenizer_model)
  File "/venv/lib/python3.10/site-packages/sentencepiece/__init__.py", line 447, in Init
    self.Load(model_file=model_file, model_proto=model_proto)
  File "/venv/lib/python3.10/site-packages/sentencepiece/__init__.py", line 905, in Load
    return self.LoadFromFile(model_file)
  File "/venv/lib/python3.10/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

quanting without the model file seems to work

brucethemoose

Owner Jan 10, 2024

Yeah I just got the same error actually.

brucethemoose

Owner Jan 10, 2024

Does anyone know how to even make an old tokenizer.model file?

Technically another tokenizer could be used but some tokens from the union tokenizer merge may be missing.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment