File size: 1,390 Bytes
12831ef
2c5a8ce
 
12831ef
2c5a8ce
 
 
 
 
 
8777589
12831ef
2c5a8ce
 
 
 
8777589
 
2c5a8ce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7347c38
 
8777589
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
language:
- am
license: mit
tags:
- automatic-speech-recognition
- speech
metrics:
- wer
- cer
pipeline_tag: automatic-speech-recognition
---

# Amharic ASR using fine-tuned Wav2vec2 XLSR-53
This is a finetuned version of [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) trained on the [Amharic Speech Corpus](http://www.openslr.org/25/). This corpus was produced by [Abate et al. (2005)](https://www.isca-speech.org/archive/interspeech_2005/abate05_interspeech.html) (10.21437/Interspeech.2005-467).

The model achieves a WER of 26% and a CER of 7% on the validation set of the Amharic Readspeech data.

## Usage
The model can be used as follows:
```python
import librosa
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

model = Wav2Vec2ForCTC.from_pretrained("agkphysics/wav2vec2-large-xlsr-53-amharic")
processor = Wav2Vec2Processor.from_pretrained("agkphysics/wav2vec2-large-xlsr-53-amharic")

audio, _ = librosa.load("/path/to/audio.wav", sr=16000)

input_values = processor(
    audio.squeeze(),
    sampling_rate=16000,
    return_tensors="pt"
).input_values

model.eval()
with torch.no_grad():
    logits = model(input_values).logits
    preds = logits.argmax(-1)
    texts = processor.batch_decode(preds)
print(texts[0])
```

## Training
The code to train this model is available at https://github.com/agkphysics/amharic-asr.