deepdml/whisper-large-v3-turbo · Efficienty of training Whisper-large-v3-turbo

Thank you so much for this efficient and impressive work! 😊

I'm relatively new to this type of model, and I have a few questions. I am looking to train a large Whisper model specifically tailored for healthcare-related vocabulary. After that, I plan to use it for real-time voice processing with Whisper.

In this context, do you think it would be more beneficial to directly train the large V3 turbo model, or should I train the regular large V3 version and then distill it myself for better performance? If so, could you explain why that approach might be preferable?

Thanks again for the quick implementation of the large V3 turbo model—it's much appreciated!

Best regards.