Automatic Speech Recognition for Belarusian language

Fine-tuned version of facebook/wav2vec2-base on mozilla-foundation/common_voice_8_0 be dataset.

Train, Dev, Test splits were used as they are present in the dataset. No additional data was used from Validated split, only 1 voicing of each sentence was used - the way the data was split by CommonVoice CorporaCreator. To build a better model one can use additional voicings from Validated split for sentences already present in Train, Dev, Test splits, i.e. enlarge mentioned splits.

Language model was built using KenLM. 5-gram Language model was built on sentences from Train + (Other - Dev - Test) splits of mozilla-foundation/common_voice_8_0 be dataset.

Source code is available here.

Run model in a browser

This page contains interactive demo widget that lets you test this model right in a browser.

However, this widget uses Acoustic model only without Language model that significantly improves overall performance.

You can play with full pipeline of Acoustic model + Language model on the following spaces page (also works from browser).

Space using ales/wav2vec2-cv-be 1

Evaluation results

Dev WER on Common Voice 8
self-reported

17.610
Test WER on Common Voice 8
self-reported

18.700
Dev WER (with LM) on Common Voice 8
self-reported

11.500
Test WER (with LM) on Common Voice 8
self-reported

12.400

View on Papers With Code

ales
/

wav2vec2-cv-be

Automatic Speech Recognition for Belarusian language

Run model in a browser

Dataset used to train ales/wav2vec2-cv-be

Space using ales/wav2vec2-cv-be 1

Evaluation results