Spaces:
Runtime error
Distil version does a bad job at Transcribing
The one on the Left is by Distil-Whisper while the one on the right is from Whisper only. It was a one minute discussion between an agent and a customer. Distil-Whisper took 0.7 time meanwhile Whisper took 3.4 time. But the results from Whisper were so close to the original audio. I then went to a site to compare both texts and as you can see there's clearly a different in Transcription quality
Hi @arslankas - Thanks for reporting this issue, this is definitely not the intended behaviour. Is it possible for you to share the audio file along with the code snippet and hyper-parameters you used?
Hi @arslankas - Thanks for reporting this issue, this is definitely not the intended behaviour. Is it possible for you to share the audio file along with the code snippet and hyper-parameters you used?
Sure. Here's the audio:
I just used your Interface to upload the audio. Got the text and went on to https://www.diffchecker.com/ to check the difference
Hey
@arslankas
- thanks for reporting. This does indeed look like an instance of distil-whisper doing a worse job at transcribing. I've run the transcription with distil-large-v3
(which will be released very soon) and the transcription accuracy looks to be better:
The main differences now are to do with formatting Randal.thomas at gmail.com
vs randal.thomas at gmail.com
which don't change the actual meaning of the text, and can be argued that both are correct. Stay tuned to the distil-whisper organisation on the Hub for updates regarding distil-large-v3