language:
- en
tags:
- audio
- automatic-speech-recognition
license: mit
Distil-Whisper: distil-large-v3 for OpenAI Whisper
This repository contains the model weights for distil-large-v3 converted to OpenAI Whisper format.
Compared to previous Distil-Whisper releases, distil-large-v3 is specifically designed to be compatible with the OpenAI Whisper long-form transcription algorithm. In our benchmark over 4 out-of-distribution datasets, distil-large-v3 outperformed distil-large-v2 by 5% WER average. Thus, you can expect significant performance gains by switching to this latest checkpoint.
Python Usage
To use the model in the original Whisper format, first ensure you have the openai-whisper
package installed.
For this example, we'll also install 🤗 Datasets to load a toy audio dataset from the Hugging Face Hub:
pip install --upgrade pip
pip install --upgrade openai-whisper datasets[audio]
The following code-snippet demonstrates how to transcribe a sample file from the LibriSpeech dataset loaded using 🤗 Datasets:
from huggingface_hub import hf_hub_download
from datasets import load_dataset
from whisper import load_model, transcribe
model_path = hf_hub_download(repo_id="distil-whisper/distil-large-v3-openai", filename="model.bin")
model = load_model(model_path)
dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = dataset[0]["audio"]["path"]
pred_out = transcribe(model, audio=sample, language="en")
print(pred_out["text"])
Note that the model weights will be downloaded and saved to your cache the first time you run the example. Subsequently, you can re-use the same example, and the weights will be loaded directly from your cache without having to download them again.
To transcribe a local audio file, simply pass the path to the audio file as the audio
argument to transcribe:
pred_out = transcribe(model, audio="audio.mp3", language="en")
CLI Usage
The Distil-Whisper model can also be used with the OpenAI Whisper CLI. First, pip install the Hugging Face Hub package:
pip install --upgrade huggingface_hub
Next, download the weights for distil-large-v3 locally:
huggingface-cli download distil-whisper/distil-large-v3-openai model.bin --local-dir distil-large-v3
Finally, use the OpenAI Whisper CLI to transcribe:
whisper audio.mp3 --model distil-large-v3/model.bin --language en
Model Details
For more information about the distil-large-v3 model, refer to the original model card.
License
Distil-Whisper inherits the MIT license from OpenAI's Whisper model.
Citation
If you use this model, please consider citing the Distil-Whisper paper:
@misc{gandhi2023distilwhisper,
title={Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling},
author={Sanchit Gandhi and Patrick von Platen and Alexander M. Rush},
year={2023},
eprint={2311.00430},
archivePrefix={arXiv},
primaryClass={cs.CL}
}