Tokenizer padding token
#76
by
Rish1
- opened
Attempting a finetune of llama3.1 8b instruct, but in config.json and tokenizer.json there seems to be no padding token assigned, which is odd to me. For batched tokenizer requests do I just pad with the EOS token?
Please share more context and error logs.
As I understood is there is no padding token is mentioned. if yes please try this.
Note: to this before Training
if not tokenizer.pad_token:
tokenizer.pad_token = tokenizer.eos_token
print(f"The tokenizer.pad_token set as a {tokenizer.eos_token}")