--- language: - am license: mit tags: - automatic-speech-recognition - speech metrics: - wer - cer pipeline_tag: automatic-speech-recognition --- # Amharic ASR using fine-tuned Wav2vec2 XLSR-53 This is a finetuned version of [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) trained on the [Amharic Speech Corpus](http://www.openslr.org/25/). This corpus was produced by [Abate et al. (2005)](https://www.isca-speech.org/archive/interspeech_2005/abate05_interspeech.html) (10.21437/Interspeech.2005-467). The model achieves a WER of 26% and a CER of 7% on the validation set of the Amharic Readspeech data. ## Usage The model can be used as follows: ```python import librosa from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor model = Wav2Vec2ForCTC.from_pretrained("agkphysics/wav2vec2-large-xlsr-53-amharic") processor = Wav2Vec2Processor.from_pretrained("agkphysics/wav2vec2-large-xlsr-53-amharic") audio, _ = librosa.load("/path/to/audio.wav", sr=16000) input_values = processor( audio.squeeze(), sampling_rate=16000, return_tensors="pt" ).input_values model.eval() with torch.no_grad(): logits = model(input_values).logits preds = logits.argmax(-1) texts = processor.batch_decode(preds) print(texts[0]) ``` ## Training The code to train this model is available at https://github.com/agkphysics/amharic-asr.