decode throwing error for TransfoXLTokenizer
#3
by
dsplog
- opened
kindly see the code snippet below. could see tokenizer.sym2idx defined, but tokenizer.idx2sym is an empty list.
>>> from transformers import TransfoXLTokenizer
>>> tokenizer = TransfoXLTokenizer.from_pretrained("transfo-xl-wt103")
>>> enc = tokenizer.encode("Hello, my dog is cute")
>>> enc
[14049, 2, 617, 3225, 23, 16072]
>>> tokenizer.decode
<bound method PreTrainedTokenizerBase.decode of TransfoXLTokenizer(name_or_path='transfo-xl-wt103', vocab_size=0, model_max_length=1000000000000000019884624838656, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'eos_token': '<eos>', 'unk_token': '<unk>', 'additional_special_tokens': ['<formula>']}, clean_up_tokenization_spaces=True), added_tokens_decoder={
0: AddedToken("<eos>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
24: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
3039: AddedToken("<formula>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}>
>>> tokenizer.decode(enc)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/home/.local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 3738, in decode
return self._decode(
File "/home/home/.local/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 1001, in _decode
filtered_tokens = self.convert_ids_to_tokens(token_ids, skip_special_tokens=skip_special_tokens)
File "/home/home/.local/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 982, in convert_ids_to_tokens
tokens.append(self._convert_id_to_token(index))
File "/home/home/.local/lib/python3.8/site-packages/transformers/models/transfo_xl/tokenization_transfo_xl.py", line 451, in _convert_id_to_token
return self.idx2sym[idx]
IndexError: list index out of range
checked the version history of transformers https://pypi.org/project/transformers/#history
this issue is not there till transformer version 4.33.3
after that, from v4.34.0 till v4.35.2, we have this issue
Hello, starting with transformers v4.36, the TransfoXL model and tokenizer will be deprecated due to a security issue.
If version v4.33.3 works for your use-case, we recommend sticking to it. Additionally, we recommend explicitly passing the repo ID (transfo-xl-wt103
) and revision (40a186da79458c9f9de846edfaea79c412137f97
) to ensure you use the correct checkpoint.