Commit
·
f6ca04e
1
Parent(s):
7d3ab33
Improve description of the system
Browse files
README.md
CHANGED
@@ -44,16 +44,17 @@ model-index:
|
|
44 |
value: 11.26
|
45 |
---
|
46 |
|
47 |
-
#
|
48 |
|
49 |
-
This model is a version of [facebook/wav2vec2-xls-r-2b-22-to-16](https://huggingface.co/facebook/wav2vec2-xls-r-2b-22-to-16) fine-tuned mainly on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - NL dataset (see details below).
|
50 |
-
It achieves the following results on the evaluation set (of Common Voice 8.0):
|
51 |
- Wer: 0.0669
|
52 |
- Cer: 0.0197
|
53 |
|
54 |
## Model description
|
55 |
|
56 |
-
The model takes 16kHz sound input, and uses a Wav2Vec2ForCTC decoder with 48 letters to output the final result.
|
|
|
|
|
57 |
|
58 |
## Intended uses & limitations
|
59 |
|
|
|
44 |
value: 11.26
|
45 |
---
|
46 |
|
47 |
+
# XLS-R-based CTC model with 5-gram language model from Common Voice
|
48 |
|
49 |
+
This model is a version of [facebook/wav2vec2-xls-r-2b-22-to-16](https://huggingface.co/facebook/wav2vec2-xls-r-2b-22-to-16) fine-tuned mainly on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - NL dataset (see details below), on which a small 5-gram language model is added based on the Common Voice training corpus. This model achieves the following results on the evaluation set (of Common Voice 8.0):
|
|
|
50 |
- Wer: 0.0669
|
51 |
- Cer: 0.0197
|
52 |
|
53 |
## Model description
|
54 |
|
55 |
+
The model takes 16kHz sound input, and uses a Wav2Vec2ForCTC decoder with 48 letters to output the final result.
|
56 |
+
|
57 |
+
To improve accuracy, a beam decoder is used; the beams are scored based on 5-gram language model trained on the Common Voice 8 corpus.
|
58 |
|
59 |
## Intended uses & limitations
|
60 |
|