distil-whisper
/

distil-medium.en

Automatic Speech Recognition

Transformers.js

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

sanchit-gandhi HF staff commited on Nov 9, 2023

Commit

9e3e0be

•

1 Parent(s): c7bea3d

whisper cpp

Files changed (1) hide show

README.md +30 -4

README.md CHANGED Viewed

@@ -263,6 +263,36 @@ To transcribe a local audio file, simply pass the path to the audio file as the
 pred_out = transcribe(model, audio="audio.mp3")
 ```
 ### Transformers.js
 ```js
@@ -312,10 +342,6 @@ cargo run --example whisper --release -- --model distil-medium.en --input audio.
 Coming soon ...
-### Whisper.cpp
-Coming soon ...
 ## Model Details
 Distil-Whisper inherits the encoder-decoder architecture from Whisper. The encoder maps a sequence of speech vector

 pred_out = transcribe(model, audio="audio.mp3")
 ```
+### Whisper.cpp
+Distil-Whisper can be run from the [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) repository with the original
+sequential long-form transcription algorithm. In a [provisional benchmark](https://github.com/ggerganov/whisper.cpp/pull/1424#issuecomment-1793513399)
+on Mac M1, `distil-medium.en` is 4x faster than `large-v2`, while performing to within 1% WER over long-form audio.
+Steps for getting started:
+1. Clone the Whisper.cpp repository:
+```
+git clone https://github.com/ggerganov/whisper.cpp.git
+cd whisper.cpp
+```
+2. Download the ggml weights for `distil-medium.en` from the Hugging Face Hub:
+```bash
+python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='distil-whisper/distil-medium.en', filename='ggml-medium-32-2.en.bin', local_dir='./models')"
+```
+Note that if you do not have the `huggingface_hub` package installed, you can also download the weights with `wget`:
+```bash
+wget https://huggingface.co/distil-whisper/distil-medium.en/resolve/main/ggml-medium-32-2.en.bin -P ./models
+```
+3. Run inference using the provided sample audio:
+```bash
+make -j && ./main -m models/ggml-medium-32-2.en.bin -f samples/jfk.wav
+```
 ### Transformers.js
 ```js
 Coming soon ...
 ## Model Details
 Distil-Whisper inherits the encoder-decoder architecture from Whisper. The encoder maps a sequence of speech vector