About the tokenizer - Why use LLaMA tokenizer?
#5
by
shuyuej
- opened
I found that the model is based on the Mistral model, but the tokenizer is based on the LLaMA.
I am confused because the special token ids are different.
Could you please explain the reasons?
https://huggingface.co/Salesforce/SFR-Embedding-2_R/blob/main/tokenizer_config.json#L42
Thank you very much in advance!