tokenizer.add_eos_token and maxlength
#15
by
Gregorioz
- opened
In the old version of demo code, tokenization processes were tokenizer.add_eos_token = True
and tokenizer(...,maxlength=maxlength-1,...)
. Recent update removed tokenizer.add_eos_token = True
, and modified value of argument maxlength
from maxlength-1
to maxlength
. What are the differences of two tokenization methods? Is there any risk since divergent embeddings are observed?