๐ฃ๏ธ Kiswahili Sahihi ASR โ Swahili Audio Transcription
This model enables high-quality, long-form Kiswahili speech transcription from multiple audio formats (e.g., .mp3
, .wav
, .m4a
, .aac
, .ogg
, .flac
, .amr
) using a simple, efficient pipeline.
Itโs optimized for speed, accuracy, and real-world usability, even on modest hardware.
๐ Key Features
- โ Supports multiple audio formats via FFmpeg + Pydub
- ๐ง Built on ๐ค Transformers
- ๐ชถ Automatically converts audio to
16 kHz
mono - โณ Transcribes long recordings using smart chunking (default: 60s per chunk)
- ๐ฅ๏ธ Works seamlessly on both CPU and GPU
- ๐ Focused on Kiswahili language transcription
๐ฆExample using the model
# ============================================
# ๐ช Full Swahili Audio Transcription Script
# ============================================
# ๐ฆ Install
!pip install transformers
!pip install "datasets<4.0.0"
!pip install torchvision==0.21.0 torchaudio==2.6.0 jiwer evaluate
!pip install soundfile librosa accelerate>=0.26.0 tensorboard -U bitsandbytes
!apt-get -y install ffmpeg
import torch
import librosa
import numpy as np
from pydub import AudioSegment
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import os
# =============================
# 1. ๐ธ Model Setup
# =============================
model_id = "keystats/kiswahili_sahihi_asr"
processor = AutoProcessor.from_pretrained(model_id)
# Use float32 to avoid half precision mismatch issues
model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id).to("cuda", dtype=torch.float32)
# =============================
# 2. ๐ธ Convert any format to WAV
# =============================
def convert_to_wav(input_path, output_path="converted.wav"):
try:
audio = AudioSegment.from_file(input_path)
audio = audio.set_frame_rate(16000).set_channels(1)
audio.export(output_path, format="wav")
return output_path
except Exception as e:
raise RuntimeError(f"โ Could not convert file. Check if FFmpeg is installed and file is supported. Error: {e}")
# ๐ Just change this path to your audio file
audio_path = "your swahili audio "
wav_path = convert_to_wav(audio_path)
# =============================
# 3. ๐ธ Load audio and chunk
# =============================
audio_input, sr = librosa.load(wav_path, sr=16000, mono=True)
chunk_length_s = 60 # seconds
chunk_size = chunk_length_s * sr
num_chunks = int(np.ceil(len(audio_input) / chunk_size))
print(f"๐น Total length: {len(audio_input)/sr:.2f} sec | Splitting into {num_chunks} chunks...")
# =============================
# 4. ๐ธ Transcribe each chunk
# =============================
full_transcription = []
for i in range(num_chunks):
start = i * chunk_size
end = min((i + 1) * chunk_size, len(audio_input))
chunk = audio_input[start:end]
inputs = processor(
chunk,
sampling_rate=16000,
return_tensors="pt",
padding=True
).to("cuda", dtype=torch.float32)
with torch.no_grad():
generated_ids = model.generate(**inputs, max_length=20000)
text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
full_transcription.append(text.strip())
# =============================
# 5. ๐ธ Combine final transcript
# =============================
final_text = " ".join(full_transcription)
print(" ๐ Final Transcription:")
print(final_text)
๐งช Example Output
๐ง Input Audio | ๐ Transcription Output |
---|---|
mashairi_sauti.mp3 | โKaribu kwenye mfumo wetu wa Kiswahili Sahihi.โ |
mazungumzo_flac.flac | โHabari yako, karibu tena kesho kwa mahojiano mengine.โ |
๐ ๏ธ Tips for Best Results
- Use clear audio without background noise.
- Long recordings are automatically split into 60-second chunks.
- Works with
.mp3
,.wav
,.m4a
,.aac
,.ogg
,.flac
,.amr
and more. - Ensure audio is sampled at 16 kHz and mono (automatically handled).
๐ Acknowledgements
๐ข Contribute
- ๐งช Share more Swahili audio samples
- ๐งโ๐ป Report issues or improvements
- ๐ Help expand coverage for different accents and dialects
๐งญ Citation
@model{kiswahili_sahihi_asr,
author = {Jackson Kahungu},
title = {Kiswahili Sahihi ASR โ Swahili Audio Transcription},
year = {2025},
publisher = {Hugging Face}
}
โจ Final Note
โIf you like the model, leave a like โค๐งกโคโ
This model may not be perfect, but it provides a strong baseline for building future Swahili transcription systems.
Together, we can make Swahili voice technology accessible to everyone.โจ ๐KISWAHILI KITUKUZWE๐
- Downloads last month
- 125
Model tree for keystats/kiswahili_sahihi_asr
Base model
openai/whisper-medium