Special tokens in output generation
Hello,
Thanks for sharing this model!
When generating output, and even if "skip_special_tokens=True" there are two special tokens at beginning ( ) and ending (\n) of this output, in addition to special whitespace tokens.
Is there any way of removing them and use space token instead of special whitespace tokens?
Thanks a lot for trying the model! Can you try using T5Tokenizer instead of AutoTokenizer, and uses spaces_between_special_tokens=False when decoding?
Thanks for your feedback! I have applied all your recommendations but I still have at the end of output generation a newline character (\n).
Any idea?
Hi,
Can you take a screenshot of the problem(input, tokenized input, decoded etc) so that we can walk through it a bit? BTW, here is a question we got from the GitHub. It seems pretty similar: https://github.com/lm-sys/FastChat/issues/1022. Maybe you can also take a look?