On windows : UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 32549

#55
by Pioupiouu - opened

Hello, got an exception using
self.tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json")

the error :
File "D:\nemo_test\nemo_api_chat\nemo_api_chat\services\ai_service.py", line 46, in init
self.tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\nemo_user_test\anaconda3\envs\nemo\Lib\site-packages\mistral_common\tokens\tokenizers\mistral.py", line 110, in from_file
tokenizer = Tekkenizer.from_file(tokenizer_filename)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\nemo_user_test\anaconda3\envs\nemo\Lib\site-packages\mistral_common\tokens\tokenizers\tekken.py", line 130, in from_file
model_data: ModelData = json.load(f)
^^^^^^^^^^^^
File "C:\Users\nemo_user_test\anaconda3\envs\nemo\Lib\json_init_.py", line 293, in load
return loads(fp.read(),
^^^^^^^^^
File "C:\Users\nemo_user_test\anaconda3\envs\nemo\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 32549: character maps to

Hope someone can help me

I fixed similar issues by setting my windows environment to UTF-8 which is in beta on W11, in my case I was entering emojis and Python didn't like it.

Sign up or log in to comment