Fine-tuning/Tokenizer

#5
by Mukhamejan - opened

Hello, I have been trying to fine-tune your model. At that time I didn't see a discussion section, so I created my tokenizer on text from that dataset. I have been fine-tuning your model with my tokenizer and at the end I got some German-sounding voice of a girl, lol. I have been wondering, did you convert your text to Latin or not? Because, when I try to tokenize on the default one, I get something like this.
image.png

So as I understand, you converted it into Latin. Can you please share the tool you used for converting it to latin, please.

Owner

Hi, Mukhamejan! No, I didn't convert the text to latin.

You have done a lot of work to create this model. It has great potential, but unfortunately, it puts the accent wrong in many words of the Russian language. How to fix and correct it?

Sign up or log in to comment