Whisper large-v3 model for Kinyarwanda

This repository contains the fine-tuned model leophill/whisper-large-v3-sn-kinyarwanda, which was built following the Kinyarwanda Automatic Speech Recognition Track A challenge organized on Kaggle by Digital Umuganda. The dataset comprises 500 hours of labeled Kinyarwanda speech data spanning five high-impact domains—Health, Government, Financial Services, Education, and Agriculture—to support robust ASR model development in both conversational and formal contexts.

This model uses Shona (sn) as proxy language, as Kinyarwanda is not taken into account by Whisper pretrained models.

The model supports capitalization and punctuation.

Usage

To run the model, first install both torch and the transformers libraries.

The model can be used with the pipeline class to transcribe audios of arbitrary length, inluding local audio files:

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "leophill/whisper-large-v3-sn-kinyarwanda"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    torch_dtype=torch_dtype,
    device=device,
)
audio_file = "audio.wav"
result = pipe(audio_file, generate_kwargs={"language": "shona", "task": "transcribe"})
print(result["text"])

More information

For more information about the original Whisper large-v3 model, see its model card.

CTranslate2 version

A CTranslate2 version of this model is available on a dedicated model page.

Citation

@misc{whisper_lv3_sn_turbo_kinyarwanda_asr,
  author = {Leopold Hillah},
  title = {Finetuning Whisper Large V3 Turbo for Kinyarwanda ASR using Shona as Proxy Language},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/leophill/whisper-large-v3-sn-kinyarwanda}
}
@misc{kinyarwanda-automatic-speech-recognition-track-a,
    author = {Digital Umuganda},
    title = {Kinyarwanda Automatic Speech Recognition Track A},
    year = {2025},
    howpublished = {\url{https://kaggle.com/competitions/kinyarwanda-automatic-speech-recognition-track-a}},
    note = {Kaggle}
}

leophill
/

whisper-large-v3-sn-kinyarwanda

Whisper large-v3 model for Kinyarwanda

Usage

More information

CTranslate2 version

Citation

Model tree for leophill/whisper-large-v3-sn-kinyarwanda