Automatic Speech Recognition for Kinyarwanda

Model Description

This model is a fine-tuned version of Wav2Vec2-BERT 2.0 for automatic speech recognition (ASR) in Kinyarwanda. It was trained on the Kinyarwanda ASR Track A dataset covering Health, Government, Finance, Education, and Agriculture domains.

Developed by: Badr al-Absi
Model type: Speech Recognition (ASR)
Language: Kinyarwanda (rw)
License: MIT
Finetuned from: facebook/w2v-bert-2.0

Model Sources

Repository: https://huggingface.co/badrex/w2v-bert-2.0-kinyarwanda-asr
Dataset: Kinyarwanda ASR Track A

Direct Use

The model can be used directly for automatic speech recognition of Kinyarwanda audio:

from transformers import Wav2Vec2BertProcessor, Wav2Vec2BertForCTC
import torch
import torchaudio

# load model and processor
processor = Wav2Vec2BertProcessor.from_pretrained("badrex/w2v-bert-2.0-kinyarwanda-asr")
model = Wav2Vec2BertForCTC.from_pretrained("badrex/w2v-bert-2.0-kinyarwanda-asr")

# load audio
audio_input, sample_rate = torchaudio.load("path/to/audio.wav")

# preprocess
inputs = processor(audio_input.squeeze(), sampling_rate=sample_rate, return_tensors="pt")

# inference
with torch.no_grad():
    logits = model(**inputs).logits

# decode
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)[0]
print(transcription)

Downstream Use

This model can be used as a foundation for:

building voice assistants for Kinyarwanda speakers
transcription services for Kinyarwanda content
accessibility tools for Kinyarwanda-speaking communities
research in low-resource speech recognition

Out-of-Scope Use

transcribing languages other than Kinyarwanda
real-time applications without proper latency testing
high-stakes applications without domain-specific validation

Bias, Risks, and Limitations

Domain bias: primarily trained on formal speech from specific domains (Health, Government, Finance, Education, Agriculture)
Accent variation: may not perform well on dialects or accents not represented in training data
Audio quality: performance may degrade on noisy or low-quality audio
Technical terms: may struggle with specialized vocabulary outside training domains

Training Data

The model was fine-tuned on the Kinyarwanda ASR Track A dataset:

Size: ~500 hours of transcribed Kinyarwanda speech
Domains: Health, Government, Finance, Education, Agriculture
Source: Digital Umuganda (Gates Foundation funded)
License: CC BY 4.0

Model Architecture

Base model: Wav2Vec2-BERT 2.0
Architecture: transformer-based with convolutional feature extractor
Parameters: ~600M (inherited from base model)
Objective: connectionist temporal classification (CTC)

Compute Infrastructure

Citation

@misc{w2v_bert_kinyarwanda_asr,
  author = {Badr M. Abdullah},
  title = {Adapting Wav2Vec2-BERT 2.0 for Kinyarwanda ASR},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/badrex/w2v-bert-2.0-kinyarwanda-asr}
}

@misc{kinyarwanda_asr_track_a,
  title={Kinyarwanda Automatic Speech Recognition Track A},
  author={Digital Umuganda},
  year={2025},
  url={https://www.kaggle.com/competitions/kinyarwanda-automatic-speech-recognition-track-a}
}

Model Card Contact

For questions or issues, please contact via the Hugging Face model repository.

badrex
/

w2v-bert-2.0-kinyarwanda-asr