XLS-R-based CTC model with 5-gram language model from Open Subtitles
This model is a version of facebook/wav2vec2-xls-r-2b-22-to-16 fine-tuned mainly on the CGN dataset, as well as the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - NL dataset (see details below), on which a large 5-gram language model is added based on the Open Subtitles Dutch corpus. This model achieves the following results on the evaluation set (of Common Voice 8.0):
- Wer: 0.04057
- Cer: 0.01222
Model description
The model takes 16kHz sound input, and uses a Wav2Vec2ForCTC decoder with 48 letters to output the letter-transcription probabilities per frame.
To improve accuracy, a beam-search decoder based on pyctcdecode
is then used; it reranks the most promising alignments based on a 5-gram language model trained on the Open Subtitles Dutch corpus.
Intended uses & limitations
This model can be used to transcribe Dutch or Flemish spoken dutch to text (without punctuation).
Training and evaluation data
The model was:
- initialized with the 2B parameter model from Facebook.
- trained
5
epochs (6000 iterations of batch size 32) on thecv8/nl
dataset. - trained
1
epoch (36000 iterations of batch size 32) on thecgn
dataset. - trained
5
epochs (6000 iterations of batch size 32) on thecv8/nl
dataset.
Framework versions
- Transformers 4.16.0
- Pytorch 1.10.2+cu102
- Datasets 1.18.3
- Tokenizers 0.11.0
- Downloads last month
- 16
Dataset used to train FremyCompany/xls-r-2b-nl-v2_lm-5gram-os
Evaluation results
- Test WER on Common Voice 8self-reported4.060
- Test CER on Common Voice 8self-reported1.220
- Test WER on Robust Speech Event - Dev Dataself-reported17.770
- Test CER on Robust Speech Event - Dev Dataself-reported9.770
- Test WER on Robust Speech Event - Test Dataself-reported16.320