File size: 968 Bytes
7f8b2ec
 
 
457412c
 
 
b050338
a3796fe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b050338
 
457412c
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
---
license: cc-by-nc-sa-4.0
---

This repository contains KenLM models for the Ukrainian language

Metrics for the NEWS models (tested with an acoustic model of [wav2vec2-xls-r-300m model](https://huggingface.co/Yehor/wav2vec2-xls-r-300m-uk-with-small-lm)):

| Model | CER | WER |
|-|-|-|
| no LM | 0.0412 | 0.2206 |
| lm-3gram-50k |  0.0348 |  0.1826 |
| lm-4gram-50k |  0.0347 | 0.1818 |
| lm-5gram-50k |  0.0347 | 0.1821 |
| lm-3gram-100k |  0.031 | 0.1588 |
| lm-4gram-100k |  0.0308 | 0.1579 |
| lm-5gram-100k |  0.0308 | 0.1579 |
| lm-3gram-300k |  0.0261 | 0.1294 |
| lm-4gram-300k |  0.0261 | 0.1293 |
| lm-5gram-300k |  0.0261 | 0.1293 |
| lm-3gram-500k |  0.0248 | 0.1209 |
| lm-4gram-500k |  0.0247 | 0.1207 |
| lm-5gram-500k |  0.0247 | 0.1209 |

Files of the models are under the Files and versions section.

Attribution to the NEWS models:
- Chaplynskyi, D. et al. (2021) lang-uk Ukrainian Ubercorpus [Data set]. https://lang.org.ua/uk/corpora/#anchor4