Merge branch 'main' of https://huggingface.co/Yehor/kenlm-ukrainian into main
Browse files
README.md
CHANGED
@@ -4,5 +4,25 @@ license: cc-by-nc-sa-4.0
|
|
4 |
|
5 |
This repository contains KenLM models for the Ukrainian language
|
6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
Attribution to the NEWS models:
|
8 |
- Chaplynskyi, D. et al. (2021) lang-uk Ukrainian Ubercorpus [Data set]. https://lang.org.ua/uk/corpora/#anchor4
|
|
|
4 |
|
5 |
This repository contains KenLM models for the Ukrainian language
|
6 |
|
7 |
+
Metrics for the NEWS models (tested with an acoustic model of [wav2vec2-xls-r-300m model](https://huggingface.co/Yehor/wav2vec2-xls-r-300m-uk-with-small-lm)):
|
8 |
+
|
9 |
+
| Model | CER | WER |
|
10 |
+
|-|-|-|
|
11 |
+
| no LM | 0.0412 | 0.2206 |
|
12 |
+
| lm-3gram-50k | 0.0348 | 0.1826 |
|
13 |
+
| lm-4gram-50k | 0.0347 | 0.1818 |
|
14 |
+
| lm-5gram-50k | 0.0347 | 0.1821 |
|
15 |
+
| lm-3gram-100k | 0.031 | 0.1588 |
|
16 |
+
| lm-4gram-100k | 0.0308 | 0.1579 |
|
17 |
+
| lm-5gram-100k | 0.0308 | 0.1579 |
|
18 |
+
| lm-3gram-300k | 0.0261 | 0.1294 |
|
19 |
+
| lm-4gram-300k | 0.0261 | 0.1293 |
|
20 |
+
| lm-5gram-300k | 0.0261 | 0.1293 |
|
21 |
+
| lm-3gram-500k | 0.0248 | 0.1209 |
|
22 |
+
| lm-4gram-500k | 0.0247 | 0.1207 |
|
23 |
+
| lm-5gram-500k | 0.0247 | 0.1209 |
|
24 |
+
|
25 |
+
Files of the models are under the Files and versions section.
|
26 |
+
|
27 |
Attribution to the NEWS models:
|
28 |
- Chaplynskyi, D. et al. (2021) lang-uk Ukrainian Ubercorpus [Data set]. https://lang.org.ua/uk/corpora/#anchor4
|