Distil version does a bad job at Transcribing

#2
by arslankas - opened

The one on the Left is by Distil-Whisper while the one on the right is from Whisper only. It was a one minute discussion between an agent and a customer. Distil-Whisper took 0.7 time meanwhile Whisper took 3.4 time. But the results from Whisper were so close to the original audio. I then went to a site to compare both texts and as you can see there's clearly a different in Transcription quality
image.png

Whisper Distillation org

Hi @arslankas - Thanks for reporting this issue, this is definitely not the intended behaviour. Is it possible for you to share the audio file along with the code snippet and hyper-parameters you used?

Hi @arslankas - Thanks for reporting this issue, this is definitely not the intended behaviour. Is it possible for you to share the audio file along with the code snippet and hyper-parameters you used?

Sure. Here's the audio:

I just used your Interface to upload the audio. Got the text and went on to https://www.diffchecker.com/ to check the difference

Whisper Distillation org

Hey @arslankas - thanks for reporting. This does indeed look like an instance of distil-whisper doing a worse job at transcribing. I've run the transcription with distil-large-v3 (which will be released very soon) and the transcription accuracy looks to be better:

Screenshot 2024-02-22 at 13.35.05.png

The main differences now are to do with formatting Randal.thomas at gmail.com vs randal.thomas at gmail.com which don't change the actual meaning of the text, and can be argued that both are correct. Stay tuned to the distil-whisper organisation on the Hub for updates regarding distil-large-v3

Sign up or log in to comment