Update Idea: Train with Gemma3 Tokenizer?

#12

by mxngjxa - opened 24 days ago

24 days ago

•

I noticed that the model was with Gemma2's tokenizers, but since the release of google/gemma-3-270m or any gemma-3-x variant, would it not make sense to update this model with the updated Gemma3 Tokenizer layers? It would be able to scale this model to 128k tokens

mxngjxa changed discussion title from Update Idea: Train with Gemma3? to Update Idea: Train with Gemma3 Tokenizer? 24 days ago

orionweller

Center for Language and Speech Processing @ JHU org 24 days ago

Hey @mxngjxa ! I think there are two parts to your question, both are great ideas but non-trivial to do:

(1) updating the tokenizer from Gemma2 to Gemma3. This would be awesome, but sadly quite hard to do since the model is already trained and has learned the original vocabulary. If someone wanted to pre-train from scratch again though, I would recommend Gemma3's tokenizer and that is a great idea.

(2) Re: 128k token context length, that was done via the Gemma post-training group with their proprietary long context data. Sadly we don't have that data, but if someone were to collect 128k length long context training data you could easily adapt our model to that size context length like we did with 8k length data.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment