@Xenova on Hugging Face: "Introducing Whisper WebGPU: Blazingly-fast ML-powered speech recognition…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Xenova

posted an update Jun 9, 2024

Post

10289

Introducing Whisper WebGPU: Blazingly-fast ML-powered speech recognition directly in your browser! 🚀 It supports multilingual transcription and translation across 100 languages! 🤯

The model runs locally, meaning no data leaves your device! 😍

Check it out! 👇
- Demo: Xenova/whisper-webgpu
- Source code: https://github.com/xenova/whisper-web/tree/experimental-webgpu

Best-codes

Jun 9, 2024

I love this. I think it would be very cool if we could get a WebGPU model running that could differentiate between different speakers in an audio sample (e.g., “Person 1, Person 2”).

Keep up the good work!

nmstoker

Jun 9, 2024

Amazing progress, thank you!

For anyone trying this on Chrome on Linux, you may need to set some flags: a comment appears in the Console when the model loading gets stuck suggesting a startup flag, but you can just switch those with chrome://flags directly and relaunch - for me, I needed to enable both Vulkan and Unsafe WebGPU, and then it works (seriously fast I should note!!)

davidmoore-io

Jun 10, 2024

Oh dear God, I love this. ❤️

ntp777

Jun 11, 2024

@Xenova can I ask how I could use this to transcribe text and store it? At the moment when I speak it recognizes everything perfectly, but it seems to only store a limited set of sentences before over-writing? Is there a way to transcribe and store locally at all?

hammeiam

Jun 14, 2024

Can you share a model to HF that is both onnx compiled and supports word-level timestamps?

chaosbasicly

Jun 14, 2024

hello xenova , i download your yolos onnx weight and inferences but the result very bad or my script model decode bad , can u give me your inferences yoloss script

nmstoker

Jun 15, 2024

Any chance of some pointers for how to use the model in plain JavaScript? I had a look at the react/ts and wasn't sure where to begin (I'm not the strongest js coder so I may be missing something obvious!)

In this post