Question regarding added tokens vs llama base

by vince62s - opened Jan 26

Jan 26

Hello,
I have some questions regarding the 7 added tokens.
Are embeddings learned at finetuning time or this just a "pre/post" processing usage ?
Also can you clarify the meaning of those:

nunonmg

Unbabel org Jan 26

Hey there,
The added tokens are there for flexibility if you want to fine-tune it for some specific use-case (e.g., MASK, CLS tokens). We only explicitly used at all times during the SFT the <|im_end|> (redefined as the eos token) and the <|im_start|> tokens.
They are learned at finetuning time.

nunonmg changed discussion status to closed Jan 27

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment