Gemma tokenizer issue
#37
by
Akshayextreme
- opened
from transformers import AutoTokenizer
model_id = "google/gemma-2b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.decode(106)
<start_of_turn>
tokenizer.encode("<start_of_turn>", add_special_tokens=False)
[235322, 2997, 235298, 559, 235298, 15508, 235313]
tokenizer.encode(tokenizer.decode(106), add_special_tokens=False)
[235322, 2997, 235298, 559, 235298, 15508, 235313]
What am I missing here?
Ideal output should be as below
tokenizer.encode(tokenizer.decode(1), add_special_tokens=False)
[1]
Hi
@Akshayextreme
, Sorry for late the response. Please try again by setting the add_special_tokens=True
. Thank you.