|
--- |
|
license: cc-by-nc-sa-4.0 |
|
language: |
|
- uk |
|
tags: |
|
- automatic-speech-recognition |
|
--- |
|
|
|
## Community |
|
|
|
- Discord: https://bit.ly/discord-uds |
|
- Speech Recognition: https://t.me/speech_recognition_uk |
|
- Speech Synthesis: https://t.me/speech_synthesis_uk |
|
- Natural Language Processing: https://t.me/nlp_uk |
|
|
|
## Overview |
|
|
|
Different KenLM models for Ukrainian. |
|
|
|
## Metrics |
|
|
|
Tested with an acoustic model from [w2v-xls-r-uk](https://huggingface.co/Yehor/w2v-xls-r-uk) model: |
|
|
|
| Model | CER | WER | |
|
|-|-|-| |
|
| no LM | 0.0412 | 0.2206 | |
|
| lm-3gram-10k (alpha=0.1) | 0.0398 | 0.2191 | |
|
| lm-4gram-10k (alpha=0.1) | 0.0398 | 0.219 | |
|
| lm-5gram-10k (alpha=0.1) | 0.0398 | 0.219 | |
|
| lm-3gram-30k | 0.038 | 0.2023 | |
|
| lm-4gram-30k | 0.0379 | 0.2018 | |
|
| lm-5gram-30k | 0.0379 | 0.202 | |
|
| lm-3gram-50k | 0.0348 | 0.1826 | |
|
| lm-4gram-50k | 0.0347 | 0.1818 | |
|
| lm-5gram-50k | 0.0347 | 0.1821 | |
|
| lm-3gram-100k | 0.031 | 0.1588 | |
|
| lm-4gram-100k | 0.0308 | 0.1579 | |
|
| lm-5gram-100k | 0.0308 | 0.1579 | |
|
| lm-3gram-300k | 0.0261 | 0.1294 | |
|
| lm-4gram-300k | 0.0261 | 0.1293 | |
|
| lm-5gram-300k | 0.0261 | 0.1293 | |
|
| lm-3gram-500k | 0.0248 | 0.1209 | |
|
| lm-4gram-500k | 0.0247 | 0.1207 | |
|
| lm-5gram-500k | 0.0247 | 0.1209 | |
|
|
|
Files of the KenLM models are under the Files and versions section. |
|
|
|
## Attribution |
|
|
|
- Chaplynskyi, D. et al. (2021) lang-uk Ukrainian Ubercorpus [Data set]. https://lang.org.ua/uk/corpora/#anchor4 |
|
|