ddwkim
/

asr-conformer-transformerlm-ksponspeech

@@ -27,9 +27,9 @@ SpeechBrain. For a better experience, we encourage you to learn more about
 [SpeechBrain](https://speechbrain.github.io).
 The performance of the model is the following:
-| Release | eval clean CER | eval other CER | GPUs |
-|:-------------:|:--------------:|:--------------:|:--------:|
-| 09-05-21 | 7.86 | 8.93 | 6xA100 80GB |
 ## Pipeline description
@@ -105,4 +105,30 @@ Please, cite SpeechBrain if you use it for your research or business.
   primaryClass={eess.AS},
   note={arXiv:2106.04624}
 }
 ```

 [SpeechBrain](https://speechbrain.github.io).
 The performance of the model is the following:
+| Release  | eval clean CER | eval other CER |    GPUs     |
+| :------: | :------------: | :------------: | :---------: |
+| 09-05-21 |     7.48%      |     8.38%      | 6xA100 80GB |
 ## Pipeline description
   primaryClass={eess.AS},
   note={arXiv:2106.04624}
 }
+```
+# Citing the model
+```bibtex
+@misc{returnzero,
+  title = {ReturnZero Conformer Korean ASR model},
+  author = {Dongwon Kim and Dongwoo Kim and Roh Jeongkyu},
+  year = {2021},
+  howpublished = {\url{https://huggingface.co/ddwkim/asr-conformer-transformerlm-ksponspeech}},
+}
+```
+# Citing KsponSpeech dataset
+```bibtex
+@Article{app10196936,
+AUTHOR = {Bang, Jeong-Uk and Yun, Seung and Kim, Seung-Hi and Choi, Mu-Yeol and Lee, Min-Kyu and Kim, Yeo-Jeong and Kim, Dong-Hyun and Park, Jun and Lee, Young-Jik and Kim, Sang-Hun},
+TITLE = {KsponSpeech: Korean Spontaneous Speech Corpus for Automatic Speech Recognition},
+JOURNAL = {Applied Sciences},
+VOLUME = {10},
+YEAR = {2020},
+NUMBER = {19},
+ARTICLE-NUMBER = {6936},
+URL = {https://www.mdpi.com/2076-3417/10/19/6936},
+ISSN = {2076-3417},
+DOI = {10.3390/app10196936}
+}
 ```

hyperparams.yaml CHANGED Viewed

@@ -5,7 +5,8 @@
 # Tokens: unigram
 # losses: CTC + KLdiv (Label Smoothing loss)
 # Training: KsponSpeech 965.2h
-# Authors: Dongwon Kim, Dongwoo Kim
 # ############################################################################
 # Seed needs to be set at top of yaml, before objects with parameters are made
@@ -40,7 +41,7 @@ max_decode_ratio: 1.0
 valid_search_interval: 10
 valid_beam_size: 10
 test_beam_size: 60
-lm_weight: 0.60
 ctc_weight_decode: 0.40
 ############################## models ################################
@@ -105,8 +106,8 @@ decoder: !new:speechbrain.decoders.S2STransformerBeamSearch
     ctc_weight: !ref <ctc_weight_decode>
     lm_weight: !ref <lm_weight>
     lm_modules: !ref <lm_model>
-    temperature: 1.15
-    temperature_lm: 1.15
     using_eos_threshold: False
     length_normalization: True

 # Tokens: unigram
 # losses: CTC + KLdiv (Label Smoothing loss)
 # Training: KsponSpeech 965.2h
+# Based on the works of: Jianyuan Zhong, Titouan Parcollet 2021
+# Authors: Dongwon Kim, Dongwoo Kim 2021
 # ############################################################################
 # Seed needs to be set at top of yaml, before objects with parameters are made
 valid_search_interval: 10
 valid_beam_size: 10
 test_beam_size: 60
+lm_weight: 0.20
 ctc_weight_decode: 0.40
 ############################## models ################################
     ctc_weight: !ref <ctc_weight_decode>
     lm_weight: !ref <lm_weight>
     lm_modules: !ref <lm_model>
+    temperature: 1.25
+    temperature_lm: 1.25
     using_eos_threshold: False
     length_normalization: True