language: | |
- am | |
license: mit | |
tags: | |
- automatic-speech-recognition | |
- speech | |
metrics: | |
- wer | |
- cer | |
# Amharic ASR using fine-tuned Wav2vec2 XLSR-53 | |
This is a finetuned version of [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) trained on the [Amharic Speech Corpus](http://www.openslr.org/25/). This corpus was produced by [Abate et al. (2005)](https://www.isca-speech.org/archive/interspeech_2005/abate05_interspeech.html) (10.21437/Interspeech.2005-467). | |
## Usage | |
The model can be used as follows: | |
```python | |
import librosa | |
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor | |
model = Wav2Vec2ForCTC.from_pretrained("agkphysics/wav2vec2-large-xlsr-53-amharic") | |
processor = Wav2Vec2Processor.from_pretrained("agkphysics/wav2vec2-large-xlsr-53-amharic") | |
audio, _ = librosa.load("/path/to/audio.wav", sr=16000) | |
input_values = processor( | |
audio.squeeze(), | |
sampling_rate=16000, | |
return_tensors="pt" | |
).input_values | |
model.eval() | |
with torch.no_grad(): | |
logits = model(input_values).logits | |
preds = logits.argmax(-1) | |
texts = processor.batch_decode(preds) | |
print(texts[0]) | |
``` | |