SberDevices
/

quartznet-russian

Model card Files Files and versions Community

karpnv commited on May 10, 2022

Commit

125b4b7

1 Parent(s): 6210fd9

Acoustic and language models

Browse files

Files changed (1) hide show

README.md +42 -3

README.md CHANGED Viewed

@@ -1,3 +1,42 @@
----
-license: other
----

+# **Acoustic and language models**
+Acoustic model built using [QuartzNet15x5](https://arxiv.org/pdf/1910.10261.pdf) architecture and trained using [NeMo toolkit](https://github.com/NVIDIA/NeMo/tree/r1.0.0b4)
+Three n-gram language models created using [KenLM Language Model Toolkit](https://kheafield.com/code/kenlm)
+* LM built on [Common Crawl](https://commoncrawl.org) Russian dataset
+* LM built on Golos train set
+* LM built on [Common Crawl](https://commoncrawl.org) and Golos datasets together (50/50)
+| Archives                 | Size       |  Links          |
+|--------------------------|------------|-----------------|
+| QuartzNet15x5_golos.nemo | 68 MB      | https://sc.link/ZMv |
+| KenLMs.tar               | 4.8 GB     | https://sc.link/YL0  |
+Golos data and models are also available in the hub of pre-trained models, datasets, and containers - DataHub ML Space. You can train the model and deploy it on the high-performance SberCloud infrastructure in [ML Space](https://sbercloud.ru/ru/aicloud/mlspace) - full-cycle machine learning development platform for DS-teams collaboration based on the Christofari Supercomputer.
+## **Evaluation**
+Percents of Word Error Rate for different test sets
+| Decoder \ Test set    | Crowd test  | Farfield test    | MCV<sup>1</sup> dev | MCV<sup>1</sup> test |
+|-------------------------------------|-----------|----------|-----------|----------|
+| Greedy decoder                      | 4.389 %   | 14.949 % | 9.314 %   | 11.278 % |
+| Beam Search with Common Crawl LM    | 4.709 %   | 12.503 % | 6.341 %   | 7.976 % |
+| Beam Search with Golos train set LM | 3.548 %   | 12.384 % |  -        | -       |
+| Beam Search with Common Crawl and Golos LM | 3.318 %   | 11.488 % | 6.4 %     | 8.06 %   |
+<sup>1</sup> [Common Voice](https://commonvoice.mozilla.org) - Mozilla's initiative to help teach machines how real people speak.
+##  **Resources**
+[[arxiv.org] Golos: Russian Dataset for Speech Research](https://arxiv.org/abs/2106.10161)
+[[habr.com] Golos — самый большой русскоязычный речевой датасет, размеченный вручную, теперь в открытом доступе](https://habr.com/ru/company/sberdevices/blog/559496/)
+[[habr.com] Как улучшить распознавание русской речи до 3% WER с помощью открытых данных](https://habr.com/ru/company/sberdevices/blog/569082/)