Amharic ASR using fine-tuned Wav2vec2 XLSR-53

This is a finetuned version of facebook/wav2vec2-large-xlsr-53 trained on the Amharic Speech Corpus. This corpus was produced by Abate et al. (2005) (10.21437/Interspeech.2005-467).

The model achieves a WER of 26% and a CER of 7% on the validation set of the Amharic Readspeech data.

Usage

The model can be used as follows:

import librosa
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

model = Wav2Vec2ForCTC.from_pretrained("agkphysics/wav2vec2-large-xlsr-53-amharic")
processor = Wav2Vec2Processor.from_pretrained("agkphysics/wav2vec2-large-xlsr-53-amharic")

audio, _ = librosa.load("/path/to/audio.wav", sr=16000)

input_values = processor(
    audio.squeeze(),
    sampling_rate=16000,
    return_tensors="pt"
).input_values

model.eval()
with torch.no_grad():
    logits = model(input_values).logits
    preds = logits.argmax(-1)
    texts = processor.batch_decode(preds)
print(texts[0])

Training

The code to train this model is available at https://github.com/agkphysics/amharic-asr.

Downloads last month
298
Safetensors
Model size
316M params
Tensor type
F32
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using agkphysics/wav2vec2-large-xlsr-53-amharic 33