Tokenizer does not add EOS token but was used in the paper

#2
by Owos - opened

The Olmo tokenizer does not add an eos_token to its output but the authors said they used the eos_token, why is this different?

https://huggingface.co/allenai/OLMo-1B-0724-hf/blob/main/tokenizer_config.json#L3

The OLMo tokenizer doesn't automatically append the eos_token to its output, even though the authors mentioned its use. The token has to be added manually during the model setup or data processing. The difference in the tokenizer_config.json is that the eos_token might not be explicitly set or linked to the tokenizer's behavior, meaning it wasn’t intended as part of the default tokenization process. This gives users the flexibility to manage its addition based on their specific use case.

Sign up or log in to comment