decode throwing error for TransfoXLTokenizer

#3
by dsplog - opened

kindly see the code snippet below. could see tokenizer.sym2idx defined, but tokenizer.idx2sym is an empty list.

>>> from transformers import TransfoXLTokenizer
>>> tokenizer = TransfoXLTokenizer.from_pretrained("transfo-xl-wt103")

>>> enc = tokenizer.encode("Hello, my dog is cute")
>>> enc
[14049, 2, 617, 3225, 23, 16072]

>>> tokenizer.decode
<bound method PreTrainedTokenizerBase.decode of TransfoXLTokenizer(name_or_path='transfo-xl-wt103', vocab_size=0, model_max_length=1000000000000000019884624838656, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'eos_token': '<eos>', 'unk_token': '<unk>', 'additional_special_tokens': ['<formula>']}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
    0: AddedToken("<eos>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    24: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    3039: AddedToken("<formula>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}>
>>> tokenizer.decode(enc)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/home/.local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 3738, in decode
    return self._decode(
  File "/home/home/.local/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 1001, in _decode
    filtered_tokens = self.convert_ids_to_tokens(token_ids, skip_special_tokens=skip_special_tokens)
  File "/home/home/.local/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 982, in convert_ids_to_tokens
    tokens.append(self._convert_id_to_token(index))
  File "/home/home/.local/lib/python3.8/site-packages/transformers/models/transfo_xl/tokenization_transfo_xl.py", line 451, in _convert_id_to_token
    return self.idx2sym[idx]
IndexError: list index out of range

checked the version history of transformers https://pypi.org/project/transformers/#history
this issue is not there till transformer version 4.33.3

after that, from v4.34.0 till v4.35.2, we have this issue

Transformer-XL community org

Hello, starting with transformers v4.36, the TransfoXL model and tokenizer will be deprecated due to a security issue.

If version v4.33.3 works for your use-case, we recommend sticking to it. Additionally, we recommend explicitly passing the repo ID (transfo-xl-wt103) and revision (40a186da79458c9f9de846edfaea79c412137f97) to ensure you use the correct checkpoint.

Sign up or log in to comment