openai/whisper-large-v3 · Currently working on finetuning dataset for japanese anime speech

Nov 7, 2023

If anyone is interested in a Japanese language finetuning dataset aimed at anime speech you can find it here:
https://huggingface.co/datasets/joujiboi/japanese-anime-speech

I hope we, the open-source community, do great stuff with datasets like this and others that make models perform more evenly across languages!

Post some niche non-English datasets for finetuning whisper below

poppysmickarlili

Dec 8, 2023

hey @joujiboi , i'm currently picking up interests on fine tuning a particular language as well, and i'm very new in this LLM domain, do you have some resource you've used before that i could make use of to start my own journey also?

thanks

joujiboi

Dec 15, 2023

Hey @poppysmickarlili

Firstly, I just want to make it clear Whisper isn't a Large Language Model (predict the next word model), it's an Automatic Speech Recognition model (speech to text).

If you want to finetune Whisper, openai has a blog on how to do so here. If you've never finetuned a model before it might be quite difficult but not impossible, if you want to start by finetuning something a lot easier Stable Diffusion is a lot easier.

Sorry for the late response, I didn't see the notification!

Good luck!