|
--- |
|
license: apache-2.0 |
|
base_model: facebook/wav2vec2-large-xlsr-53 |
|
tags: |
|
- generated_from_trainer |
|
datasets: |
|
- common_voice_13_0 |
|
metrics: |
|
- wer |
|
model-index: |
|
- name: wav2vec2-large-xlsr-mvc-swahili |
|
results: |
|
- task: |
|
name: Automatic Speech Recognition |
|
type: automatic-speech-recognition |
|
dataset: |
|
name: common_voice_13_0 |
|
type: common_voice_13_0 |
|
config: sw |
|
split: test |
|
args: sw |
|
metrics: |
|
- name: Wer |
|
type: wer |
|
value: 0.2 |
|
language: |
|
- sw |
|
--- |
|
|
|
|
|
# wav2vec2-large-xlsr-mvc-swahili |
|
|
|
This model is a finetuned version of facebook/wav2vec2-large-xlsr-53. |
|
<!--Following inspiration from [alamsher/wav2vec2-large-xlsr-53-common-voice-s](https://huggingface.co/alamsher/wav2vec2-large-xlsr-53-common-voice-sw)--> |
|
|
|
# How to use the model |
|
|
|
There was an issue with vocab, seems like there are special characters included and they were not considered during training |
|
You could try |
|
```python |
|
from transformers import AutoProcessor, AutoModelForCTC |
|
|
|
repo_name = "eddiegulay/wav2vec2-large-xlsr-mvc-swahili" |
|
processor = AutoProcessor.from_pretrained(repo_name) |
|
model = AutoModelForCTC.from_pretrained(repo_name) |
|
|
|
# if you have GPU |
|
# move model to CUDA |
|
model = model.to("cuda") |
|
|
|
|
|
def transcribe(audio_path): |
|
# Load the audio file |
|
audio_input, sample_rate = torchaudio.load(audio_path) |
|
target_sample_rate = 16000 |
|
audio_input = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=target_sample_rate)(audio_input) |
|
|
|
# Preprocess the audio data |
|
input_dict = processor(audio_input[0], return_tensors="pt", padding=True, sampling_rate=16000) |
|
|
|
# Perform inference and transcribe |
|
logits = model(input_dict.input_values.to("cuda")).logits |
|
pred_ids = torch.argmax(logits, dim=-1)[0] |
|
transcription = processor.decode(pred_ids) |
|
|
|
return transcription |
|
|
|
transcript = transcribe('your_audio.mp3') |
|
``` |