About the tokenizer - Why use LLaMA tokenizer?

#5
by shuyuej - opened

I found that the model is based on the Mistral model, but the tokenizer is based on the LLaMA.
I am confused because the special token ids are different.
Could you please explain the reasons?

https://huggingface.co/Salesforce/SFR-Embedding-2_R/blob/main/tokenizer_config.json#L42

Thank you very much in advance!

Sign up or log in to comment