karpnv commited on
Commit
125b4b7
·
1 Parent(s): 6210fd9

Acoustic and language models

Browse files
Files changed (1) hide show
  1. README.md +42 -3
README.md CHANGED
@@ -1,3 +1,42 @@
1
- ---
2
- license: other
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # **Acoustic and language models**
2
+
3
+ Acoustic model built using [QuartzNet15x5](https://arxiv.org/pdf/1910.10261.pdf) architecture and trained using [NeMo toolkit](https://github.com/NVIDIA/NeMo/tree/r1.0.0b4)
4
+
5
+
6
+ Three n-gram language models created using [KenLM Language Model Toolkit](https://kheafield.com/code/kenlm)
7
+
8
+ * LM built on [Common Crawl](https://commoncrawl.org) Russian dataset
9
+ * LM built on Golos train set
10
+ * LM built on [Common Crawl](https://commoncrawl.org) and Golos datasets together (50/50)
11
+
12
+ | Archives | Size | Links |
13
+ |--------------------------|------------|-----------------|
14
+ | QuartzNet15x5_golos.nemo | 68 MB | https://sc.link/ZMv |
15
+ | KenLMs.tar | 4.8 GB | https://sc.link/YL0 |
16
+
17
+
18
+ Golos data and models are also available in the hub of pre-trained models, datasets, and containers - DataHub ML Space. You can train the model and deploy it on the high-performance SberCloud infrastructure in [ML Space](https://sbercloud.ru/ru/aicloud/mlspace) - full-cycle machine learning development platform for DS-teams collaboration based on the Christofari Supercomputer.
19
+
20
+
21
+ ## **Evaluation**
22
+
23
+ Percents of Word Error Rate for different test sets
24
+
25
+
26
+ | Decoder \ Test set | Crowd test | Farfield test | MCV<sup>1</sup> dev | MCV<sup>1</sup> test |
27
+ |-------------------------------------|-----------|----------|-----------|----------|
28
+ | Greedy decoder | 4.389 % | 14.949 % | 9.314 % | 11.278 % |
29
+ | Beam Search with Common Crawl LM | 4.709 % | 12.503 % | 6.341 % | 7.976 % |
30
+ | Beam Search with Golos train set LM | 3.548 % | 12.384 % | - | - |
31
+ | Beam Search with Common Crawl and Golos LM | 3.318 % | 11.488 % | 6.4 % | 8.06 % |
32
+
33
+
34
+ <sup>1</sup> [Common Voice](https://commonvoice.mozilla.org) - Mozilla's initiative to help teach machines how real people speak.
35
+
36
+ ## **Resources**
37
+
38
+ [[arxiv.org] Golos: Russian Dataset for Speech Research](https://arxiv.org/abs/2106.10161)
39
+
40
+ [[habr.com] Golos — самый большой русскоязычный речевой датасет, размеченный вручную, теперь в открытом доступе](https://habr.com/ru/company/sberdevices/blog/559496/)
41
+
42
+ [[habr.com] Как улучшить распознавание русской речи до 3% WER с помощью открытых данных](https://habr.com/ru/company/sberdevices/blog/569082/)