Currently working on finetuning dataset for japanese anime speech
If anyone is interested in a Japanese language finetuning dataset aimed at anime speech you can find it here:
https://huggingface.co/datasets/joujiboi/japanese-anime-speech
I hope we, the open-source community, do great stuff with datasets like this and others that make models perform more evenly across languages!
Post some niche non-English datasets for finetuning whisper below
hey @joujiboi , i'm currently picking up interests on fine tuning a particular language as well, and i'm very new in this LLM domain, do you have some resource you've used before that i could make use of to start my own journey also?
thanks
Firstly, I just want to make it clear Whisper isn't a Large Language Model (predict the next word model), it's an Automatic Speech Recognition model (speech to text).
If you want to finetune Whisper, openai has a blog on how to do so here. If you've never finetuned a model before it might be quite difficult but not impossible, if you want to start by finetuning something a lot easier Stable Diffusion is a lot easier.
Sorry for the late response, I didn't see the notification!
Good luck!