sanchit-gandhi's picture
set lang
8194103
metadata
language:
  - en
tags:
  - audio
  - automatic-speech-recognition
license: mit

Distil-Whisper: distil-large-v3 for OpenAI Whisper

This repository contains the model weights for distil-large-v3 converted to OpenAI Whisper format.

Compared to previous Distil-Whisper releases, distil-large-v3 is specifically designed to be compatible with the OpenAI Whisper long-form transcription algorithm. In our benchmark over 4 out-of-distribution datasets, distil-large-v3 outperformed distil-large-v2 by 5% WER average. Thus, you can expect significant performance gains by switching to this latest checkpoint.

Python Usage

To use the model in the original Whisper format, first ensure you have the openai-whisper package installed. For this example, we'll also install 🤗 Datasets to load a toy audio dataset from the Hugging Face Hub:

pip install --upgrade pip
pip install --upgrade openai-whisper datasets[audio]

The following code-snippet demonstrates how to transcribe a sample file from the LibriSpeech dataset loaded using 🤗 Datasets:

from huggingface_hub import hf_hub_download
from datasets import load_dataset
from whisper import load_model, transcribe

model_path = hf_hub_download(repo_id="distil-whisper/distil-large-v3-openai", filename="model.bin")
model = load_model(model_path)

dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = dataset[0]["audio"]["path"]

pred_out = transcribe(model, audio=sample, language="en")
print(pred_out["text"])

Note that the model weights will be downloaded and saved to your cache the first time you run the example. Subsequently, you can re-use the same example, and the weights will be loaded directly from your cache without having to download them again.

To transcribe a local audio file, simply pass the path to the audio file as the audio argument to transcribe:

pred_out = transcribe(model, audio="audio.mp3", language="en")

CLI Usage

The Distil-Whisper model can also be used with the OpenAI Whisper CLI. First, pip install the Hugging Face Hub package:

pip install --upgrade huggingface_hub

Next, download the weights for distil-large-v3 locally:

huggingface-cli download distil-whisper/distil-large-v3-openai model.bin --local-dir distil-large-v3

Finally, use the OpenAI Whisper CLI to transcribe:

whisper audio.mp3 --model distil-large-v3/model.bin --language en

Model Details

For more information about the distil-large-v3 model, refer to the original model card.

License

Distil-Whisper inherits the MIT license from OpenAI's Whisper model.

Citation

If you use this model, please consider citing the Distil-Whisper paper:

@misc{gandhi2023distilwhisper,
      title={Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling}, 
      author={Sanchit Gandhi and Patrick von Platen and Alexander M. Rush},
      year={2023},
      eprint={2311.00430},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}