Siddhant commited on
Commit
d390347
·
1 Parent(s): ed4649a

import from zenodo

Browse files
Files changed (32) hide show
  1. README.md +50 -0
  2. dump/22k/xvector/dev-clean/spk_xvector.ark +0 -0
  3. dump/22k/xvector/dev-clean/spk_xvector.scp +96 -0
  4. dump/22k/xvector/test-clean/spk_xvector.ark +0 -0
  5. dump/22k/xvector/test-clean/spk_xvector.scp +81 -0
  6. dump/22k/xvector/train-clean-460/spk_xvector.ark +0 -0
  7. dump/22k/xvector/train-clean-460/spk_xvector.scp +0 -0
  8. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/config.yaml +400 -0
  9. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_backward_time.png +0 -0
  10. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_fake_loss.png +0 -0
  11. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_forward_time.png +0 -0
  12. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_loss.png +0 -0
  13. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_optim_step_time.png +0 -0
  14. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_real_loss.png +0 -0
  15. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_train_time.png +0 -0
  16. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_adv_loss.png +0 -0
  17. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_backward_time.png +0 -0
  18. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_dur_loss.png +0 -0
  19. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_feat_match_loss.png +0 -0
  20. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_forward_time.png +0 -0
  21. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_kl_loss.png +0 -0
  22. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_loss.png +0 -0
  23. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_mel_loss.png +0 -0
  24. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_optim_step_time.png +0 -0
  25. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_train_time.png +0 -0
  26. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/gpu_max_cached_mem_GB.png +0 -0
  27. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/iter_time.png +0 -0
  28. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/optim0_lr0.png +0 -0
  29. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/optim1_lr0.png +0 -0
  30. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/train_time.png +0 -0
  31. exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/train.total_count.ave_10best.pth +3 -0
  32. meta.yaml +8 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - text-to-speech
6
+ language: en
7
+ datasets:
8
+ - libritts
9
+ license: cc-by-4.0
10
+ ---
11
+ ## ESPnet2 TTS pretrained model
12
+ ### `kan-bayashi/libritts_xvector_vits`
13
+ ♻️ Imported from https://zenodo.org/record/5521416/
14
+
15
+ This model was trained by kan-bayashi using libritts/tts1 recipe in [espnet](https://github.com/espnet/espnet/).
16
+ ### Demo: How to use in ESPnet2
17
+ ```python
18
+ # coming soon
19
+ ```
20
+ ### Citing ESPnet
21
+ ```BibTex
22
+ @inproceedings{watanabe2018espnet,
23
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
24
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
25
+ year={2018},
26
+ booktitle={Proceedings of Interspeech},
27
+ pages={2207--2211},
28
+ doi={10.21437/Interspeech.2018-1456},
29
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
30
+ }
31
+ @inproceedings{hayashi2020espnet,
32
+ title={{Espnet-TTS}: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit},
33
+ author={Hayashi, Tomoki and Yamamoto, Ryuichi and Inoue, Katsuki and Yoshimura, Takenori and Watanabe, Shinji and Toda, Tomoki and Takeda, Kazuya and Zhang, Yu and Tan, Xu},
34
+ booktitle={Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
35
+ pages={7654--7658},
36
+ year={2020},
37
+ organization={IEEE}
38
+ }
39
+ ```
40
+ or arXiv:
41
+ ```bibtex
42
+ @misc{watanabe2018espnet,
43
+ title={ESPnet: End-to-End Speech Processing Toolkit},
44
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Enrique Yalta Soplin and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
45
+ year={2018},
46
+ eprint={1804.00015},
47
+ archivePrefix={arXiv},
48
+ primaryClass={cs.CL}
49
+ }
50
+ ```
dump/22k/xvector/dev-clean/spk_xvector.ark ADDED
Binary file (199 kB). View file
 
dump/22k/xvector/dev-clean/spk_xvector.scp ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1272_128104 dump/22k/xvector/dev-clean/spk_xvector.ark:12
2
+ 1272_135031 dump/22k/xvector/dev-clean/spk_xvector.ark:2082
3
+ 1272_141231 dump/22k/xvector/dev-clean/spk_xvector.ark:4152
4
+ 1462_170138 dump/22k/xvector/dev-clean/spk_xvector.ark:6222
5
+ 1462_170142 dump/22k/xvector/dev-clean/spk_xvector.ark:8292
6
+ 1462_170145 dump/22k/xvector/dev-clean/spk_xvector.ark:10362
7
+ 1673_143396 dump/22k/xvector/dev-clean/spk_xvector.ark:12432
8
+ 1673_143397 dump/22k/xvector/dev-clean/spk_xvector.ark:14502
9
+ 174_168635 dump/22k/xvector/dev-clean/spk_xvector.ark:16571
10
+ 174_50561 dump/22k/xvector/dev-clean/spk_xvector.ark:18639
11
+ 174_84280 dump/22k/xvector/dev-clean/spk_xvector.ark:20707
12
+ 1919_142785 dump/22k/xvector/dev-clean/spk_xvector.ark:22777
13
+ 1988_147956 dump/22k/xvector/dev-clean/spk_xvector.ark:24847
14
+ 1988_148538 dump/22k/xvector/dev-clean/spk_xvector.ark:26917
15
+ 1988_24833 dump/22k/xvector/dev-clean/spk_xvector.ark:28986
16
+ 1993_147149 dump/22k/xvector/dev-clean/spk_xvector.ark:31056
17
+ 1993_147964 dump/22k/xvector/dev-clean/spk_xvector.ark:33126
18
+ 1993_147965 dump/22k/xvector/dev-clean/spk_xvector.ark:35196
19
+ 1993_147966 dump/22k/xvector/dev-clean/spk_xvector.ark:37266
20
+ 2035_147960 dump/22k/xvector/dev-clean/spk_xvector.ark:39336
21
+ 2035_147961 dump/22k/xvector/dev-clean/spk_xvector.ark:41406
22
+ 2035_152373 dump/22k/xvector/dev-clean/spk_xvector.ark:43476
23
+ 2078_142845 dump/22k/xvector/dev-clean/spk_xvector.ark:45546
24
+ 2086_149214 dump/22k/xvector/dev-clean/spk_xvector.ark:47616
25
+ 2086_149220 dump/22k/xvector/dev-clean/spk_xvector.ark:49686
26
+ 2277_149874 dump/22k/xvector/dev-clean/spk_xvector.ark:51756
27
+ 2277_149896 dump/22k/xvector/dev-clean/spk_xvector.ark:53826
28
+ 2277_149897 dump/22k/xvector/dev-clean/spk_xvector.ark:55896
29
+ 2412_153947 dump/22k/xvector/dev-clean/spk_xvector.ark:57966
30
+ 2412_153948 dump/22k/xvector/dev-clean/spk_xvector.ark:60036
31
+ 2412_153954 dump/22k/xvector/dev-clean/spk_xvector.ark:62106
32
+ 2428_83699 dump/22k/xvector/dev-clean/spk_xvector.ark:64175
33
+ 2428_83705 dump/22k/xvector/dev-clean/spk_xvector.ark:66244
34
+ 251_118436 dump/22k/xvector/dev-clean/spk_xvector.ark:68313
35
+ 251_136532 dump/22k/xvector/dev-clean/spk_xvector.ark:70382
36
+ 251_137823 dump/22k/xvector/dev-clean/spk_xvector.ark:72451
37
+ 2803_154320 dump/22k/xvector/dev-clean/spk_xvector.ark:74521
38
+ 2803_154328 dump/22k/xvector/dev-clean/spk_xvector.ark:76591
39
+ 2803_161169 dump/22k/xvector/dev-clean/spk_xvector.ark:78661
40
+ 2902_9006 dump/22k/xvector/dev-clean/spk_xvector.ark:80729
41
+ 2902_9008 dump/22k/xvector/dev-clean/spk_xvector.ark:82797
42
+ 3000_15664 dump/22k/xvector/dev-clean/spk_xvector.ark:84866
43
+ 3081_166546 dump/22k/xvector/dev-clean/spk_xvector.ark:86936
44
+ 3170_137482 dump/22k/xvector/dev-clean/spk_xvector.ark:89006
45
+ 3536_23268 dump/22k/xvector/dev-clean/spk_xvector.ark:91075
46
+ 3536_8226 dump/22k/xvector/dev-clean/spk_xvector.ark:93143
47
+ 3576_138058 dump/22k/xvector/dev-clean/spk_xvector.ark:95213
48
+ 3752_4943 dump/22k/xvector/dev-clean/spk_xvector.ark:97281
49
+ 3752_4944 dump/22k/xvector/dev-clean/spk_xvector.ark:99349
50
+ 3853_163249 dump/22k/xvector/dev-clean/spk_xvector.ark:101419
51
+ 422_122949 dump/22k/xvector/dev-clean/spk_xvector.ark:103488
52
+ 5338_24615 dump/22k/xvector/dev-clean/spk_xvector.ark:105557
53
+ 5338_24640 dump/22k/xvector/dev-clean/spk_xvector.ark:107626
54
+ 5338_284437 dump/22k/xvector/dev-clean/spk_xvector.ark:109696
55
+ 5536_43358 dump/22k/xvector/dev-clean/spk_xvector.ark:111765
56
+ 5536_43359 dump/22k/xvector/dev-clean/spk_xvector.ark:113834
57
+ 5536_43363 dump/22k/xvector/dev-clean/spk_xvector.ark:115903
58
+ 5694_64025 dump/22k/xvector/dev-clean/spk_xvector.ark:117972
59
+ 5694_64029 dump/22k/xvector/dev-clean/spk_xvector.ark:120041
60
+ 5694_64038 dump/22k/xvector/dev-clean/spk_xvector.ark:122110
61
+ 5895_34615 dump/22k/xvector/dev-clean/spk_xvector.ark:124179
62
+ 5895_34622 dump/22k/xvector/dev-clean/spk_xvector.ark:126248
63
+ 5895_34629 dump/22k/xvector/dev-clean/spk_xvector.ark:128317
64
+ 6241_61943 dump/22k/xvector/dev-clean/spk_xvector.ark:130386
65
+ 6241_61946 dump/22k/xvector/dev-clean/spk_xvector.ark:132455
66
+ 6241_66616 dump/22k/xvector/dev-clean/spk_xvector.ark:134524
67
+ 6295_244435 dump/22k/xvector/dev-clean/spk_xvector.ark:136594
68
+ 6295_64301 dump/22k/xvector/dev-clean/spk_xvector.ark:138663
69
+ 6313_66125 dump/22k/xvector/dev-clean/spk_xvector.ark:140732
70
+ 6313_66129 dump/22k/xvector/dev-clean/spk_xvector.ark:142801
71
+ 6313_76958 dump/22k/xvector/dev-clean/spk_xvector.ark:144870
72
+ 6319_275224 dump/22k/xvector/dev-clean/spk_xvector.ark:146940
73
+ 6319_57405 dump/22k/xvector/dev-clean/spk_xvector.ark:149009
74
+ 6319_64726 dump/22k/xvector/dev-clean/spk_xvector.ark:151078
75
+ 6345_64257 dump/22k/xvector/dev-clean/spk_xvector.ark:153147
76
+ 6345_93302 dump/22k/xvector/dev-clean/spk_xvector.ark:155216
77
+ 6345_93306 dump/22k/xvector/dev-clean/spk_xvector.ark:157285
78
+ 652_129742 dump/22k/xvector/dev-clean/spk_xvector.ark:159354
79
+ 652_130737 dump/22k/xvector/dev-clean/spk_xvector.ark:161423
80
+ 777_126732 dump/22k/xvector/dev-clean/spk_xvector.ark:163492
81
+ 7850_111771 dump/22k/xvector/dev-clean/spk_xvector.ark:165562
82
+ 7850_281318 dump/22k/xvector/dev-clean/spk_xvector.ark:167632
83
+ 7850_286674 dump/22k/xvector/dev-clean/spk_xvector.ark:169702
84
+ 7850_73752 dump/22k/xvector/dev-clean/spk_xvector.ark:171771
85
+ 7976_105575 dump/22k/xvector/dev-clean/spk_xvector.ark:173841
86
+ 7976_110124 dump/22k/xvector/dev-clean/spk_xvector.ark:175911
87
+ 7976_110523 dump/22k/xvector/dev-clean/spk_xvector.ark:177981
88
+ 8297_275154 dump/22k/xvector/dev-clean/spk_xvector.ark:180051
89
+ 8297_275155 dump/22k/xvector/dev-clean/spk_xvector.ark:182121
90
+ 8297_275156 dump/22k/xvector/dev-clean/spk_xvector.ark:184191
91
+ 84_121123 dump/22k/xvector/dev-clean/spk_xvector.ark:186259
92
+ 84_121550 dump/22k/xvector/dev-clean/spk_xvector.ark:188327
93
+ 8842_302196 dump/22k/xvector/dev-clean/spk_xvector.ark:190397
94
+ 8842_302201 dump/22k/xvector/dev-clean/spk_xvector.ark:192467
95
+ 8842_302203 dump/22k/xvector/dev-clean/spk_xvector.ark:194537
96
+ 8842_304647 dump/22k/xvector/dev-clean/spk_xvector.ark:196607
dump/22k/xvector/test-clean/spk_xvector.ark ADDED
Binary file (168 kB). View file
 
dump/22k/xvector/test-clean/spk_xvector.scp ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1089_134686 dump/22k/xvector/test-clean/spk_xvector.ark:12
2
+ 1089_134691 dump/22k/xvector/test-clean/spk_xvector.ark:2082
3
+ 1188_133604 dump/22k/xvector/test-clean/spk_xvector.ark:4152
4
+ 121_121726 dump/22k/xvector/test-clean/spk_xvector.ark:6221
5
+ 121_123859 dump/22k/xvector/test-clean/spk_xvector.ark:8290
6
+ 121_127105 dump/22k/xvector/test-clean/spk_xvector.ark:10359
7
+ 1221_135766 dump/22k/xvector/test-clean/spk_xvector.ark:12429
8
+ 1221_135767 dump/22k/xvector/test-clean/spk_xvector.ark:14499
9
+ 1284_1180 dump/22k/xvector/test-clean/spk_xvector.ark:16567
10
+ 1284_1181 dump/22k/xvector/test-clean/spk_xvector.ark:18635
11
+ 1320_122612 dump/22k/xvector/test-clean/spk_xvector.ark:20705
12
+ 1320_122617 dump/22k/xvector/test-clean/spk_xvector.ark:22775
13
+ 1580_141083 dump/22k/xvector/test-clean/spk_xvector.ark:24845
14
+ 1580_141084 dump/22k/xvector/test-clean/spk_xvector.ark:26915
15
+ 1995_1826 dump/22k/xvector/test-clean/spk_xvector.ark:28983
16
+ 1995_1836 dump/22k/xvector/test-clean/spk_xvector.ark:31051
17
+ 1995_1837 dump/22k/xvector/test-clean/spk_xvector.ark:33119
18
+ 2300_131720 dump/22k/xvector/test-clean/spk_xvector.ark:35189
19
+ 237_126133 dump/22k/xvector/test-clean/spk_xvector.ark:37258
20
+ 237_134493 dump/22k/xvector/test-clean/spk_xvector.ark:39327
21
+ 237_134500 dump/22k/xvector/test-clean/spk_xvector.ark:41396
22
+ 260_123286 dump/22k/xvector/test-clean/spk_xvector.ark:43465
23
+ 260_123288 dump/22k/xvector/test-clean/spk_xvector.ark:45534
24
+ 260_123440 dump/22k/xvector/test-clean/spk_xvector.ark:47603
25
+ 2830_3979 dump/22k/xvector/test-clean/spk_xvector.ark:49671
26
+ 2830_3980 dump/22k/xvector/test-clean/spk_xvector.ark:51739
27
+ 2961_961 dump/22k/xvector/test-clean/spk_xvector.ark:53806
28
+ 3570_5694 dump/22k/xvector/test-clean/spk_xvector.ark:55874
29
+ 3570_5695 dump/22k/xvector/test-clean/spk_xvector.ark:57942
30
+ 3570_5696 dump/22k/xvector/test-clean/spk_xvector.ark:60010
31
+ 3575_170457 dump/22k/xvector/test-clean/spk_xvector.ark:62080
32
+ 3729_6852 dump/22k/xvector/test-clean/spk_xvector.ark:64148
33
+ 4077_13751 dump/22k/xvector/test-clean/spk_xvector.ark:66217
34
+ 4077_13754 dump/22k/xvector/test-clean/spk_xvector.ark:68286
35
+ 4446_2271 dump/22k/xvector/test-clean/spk_xvector.ark:70354
36
+ 4446_2273 dump/22k/xvector/test-clean/spk_xvector.ark:72422
37
+ 4446_2275 dump/22k/xvector/test-clean/spk_xvector.ark:74490
38
+ 4507_16021 dump/22k/xvector/test-clean/spk_xvector.ark:76559
39
+ 4970_29093 dump/22k/xvector/test-clean/spk_xvector.ark:78628
40
+ 4970_29095 dump/22k/xvector/test-clean/spk_xvector.ark:80697
41
+ 4992_23283 dump/22k/xvector/test-clean/spk_xvector.ark:82766
42
+ 4992_41797 dump/22k/xvector/test-clean/spk_xvector.ark:84835
43
+ 4992_41806 dump/22k/xvector/test-clean/spk_xvector.ark:86904
44
+ 5105_28233 dump/22k/xvector/test-clean/spk_xvector.ark:88973
45
+ 5105_28240 dump/22k/xvector/test-clean/spk_xvector.ark:91042
46
+ 5105_28241 dump/22k/xvector/test-clean/spk_xvector.ark:93111
47
+ 5142_33396 dump/22k/xvector/test-clean/spk_xvector.ark:95180
48
+ 5142_36377 dump/22k/xvector/test-clean/spk_xvector.ark:97249
49
+ 5142_36586 dump/22k/xvector/test-clean/spk_xvector.ark:99318
50
+ 5142_36600 dump/22k/xvector/test-clean/spk_xvector.ark:101387
51
+ 5639_40744 dump/22k/xvector/test-clean/spk_xvector.ark:103456
52
+ 5683_32865 dump/22k/xvector/test-clean/spk_xvector.ark:105525
53
+ 5683_32866 dump/22k/xvector/test-clean/spk_xvector.ark:107594
54
+ 5683_32879 dump/22k/xvector/test-clean/spk_xvector.ark:109663
55
+ 61_70970 dump/22k/xvector/test-clean/spk_xvector.ark:111730
56
+ 672_122797 dump/22k/xvector/test-clean/spk_xvector.ark:113799
57
+ 6829_68769 dump/22k/xvector/test-clean/spk_xvector.ark:115868
58
+ 6829_68771 dump/22k/xvector/test-clean/spk_xvector.ark:117937
59
+ 6930_75918 dump/22k/xvector/test-clean/spk_xvector.ark:120006
60
+ 6930_76324 dump/22k/xvector/test-clean/spk_xvector.ark:122075
61
+ 6930_81414 dump/22k/xvector/test-clean/spk_xvector.ark:124144
62
+ 7021_79730 dump/22k/xvector/test-clean/spk_xvector.ark:126213
63
+ 7021_79740 dump/22k/xvector/test-clean/spk_xvector.ark:128282
64
+ 7021_79759 dump/22k/xvector/test-clean/spk_xvector.ark:130351
65
+ 7021_85628 dump/22k/xvector/test-clean/spk_xvector.ark:132420
66
+ 7127_75946 dump/22k/xvector/test-clean/spk_xvector.ark:134489
67
+ 7127_75947 dump/22k/xvector/test-clean/spk_xvector.ark:136558
68
+ 7176_88083 dump/22k/xvector/test-clean/spk_xvector.ark:138627
69
+ 7176_92135 dump/22k/xvector/test-clean/spk_xvector.ark:140696
70
+ 7729_102255 dump/22k/xvector/test-clean/spk_xvector.ark:142766
71
+ 8224_274384 dump/22k/xvector/test-clean/spk_xvector.ark:144836
72
+ 8230_279154 dump/22k/xvector/test-clean/spk_xvector.ark:146906
73
+ 8455_210777 dump/22k/xvector/test-clean/spk_xvector.ark:148976
74
+ 8463_287645 dump/22k/xvector/test-clean/spk_xvector.ark:151046
75
+ 8463_294825 dump/22k/xvector/test-clean/spk_xvector.ark:153116
76
+ 8463_294828 dump/22k/xvector/test-clean/spk_xvector.ark:155186
77
+ 8555_284447 dump/22k/xvector/test-clean/spk_xvector.ark:157256
78
+ 8555_284449 dump/22k/xvector/test-clean/spk_xvector.ark:159326
79
+ 8555_292519 dump/22k/xvector/test-clean/spk_xvector.ark:161396
80
+ 908_157963 dump/22k/xvector/test-clean/spk_xvector.ark:163465
81
+ 908_31957 dump/22k/xvector/test-clean/spk_xvector.ark:165533
dump/22k/xvector/train-clean-460/spk_xvector.ark ADDED
Binary file (5.3 MB). View file
 
dump/22k/xvector/train-clean-460/spk_xvector.scp ADDED
The diff for this file is too large to render. See raw diff
 
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/config.yaml ADDED
@@ -0,0 +1,400 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: ./conf/tuning/train_xvector_vits.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space
7
+ ngpu: 1
8
+ seed: 777
9
+ num_workers: 4
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: 4
14
+ dist_rank: 0
15
+ local_rank: 0
16
+ dist_master_addr: localhost
17
+ dist_master_port: 60056
18
+ dist_launcher: null
19
+ multiprocessing_distributed: true
20
+ unused_parameters: true
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: false
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 100
28
+ patience: null
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - train
38
+ - total_count
39
+ - max
40
+ keep_nbest_models: 10
41
+ grad_clip: -1
42
+ grad_clip_type: 2.0
43
+ grad_noise: false
44
+ accum_grad: 1
45
+ no_forward_run: false
46
+ resume: true
47
+ train_dtype: float32
48
+ use_amp: false
49
+ log_interval: 50
50
+ use_tensorboard: true
51
+ use_wandb: false
52
+ wandb_project: null
53
+ wandb_id: null
54
+ wandb_entity: null
55
+ wandb_name: null
56
+ wandb_model_log_interval: -1
57
+ detect_anomaly: false
58
+ pretrain_path: null
59
+ init_param: []
60
+ ignore_init_mismatch: false
61
+ freeze_param: []
62
+ num_iters_per_epoch: 10000
63
+ batch_size: 20
64
+ valid_batch_size: null
65
+ batch_bins: 5000000
66
+ valid_batch_bins: null
67
+ train_shape_file:
68
+ - exp/tts_stats_raw_linear_spectrogram_phn_tacotron_g2p_en_no_space/train/text_shape.phn
69
+ - exp/tts_stats_raw_linear_spectrogram_phn_tacotron_g2p_en_no_space/train/speech_shape
70
+ valid_shape_file:
71
+ - exp/tts_stats_raw_linear_spectrogram_phn_tacotron_g2p_en_no_space/valid/text_shape.phn
72
+ - exp/tts_stats_raw_linear_spectrogram_phn_tacotron_g2p_en_no_space/valid/speech_shape
73
+ batch_type: numel
74
+ valid_batch_type: null
75
+ fold_length:
76
+ - 150
77
+ - 204800
78
+ sort_in_batch: descending
79
+ sort_batch: descending
80
+ multiple_iterator: false
81
+ chunk_length: 500
82
+ chunk_shift_ratio: 0.5
83
+ num_cache_chunks: 1024
84
+ train_data_path_and_name_and_type:
85
+ - - dump/22k/raw/train-clean-460/text
86
+ - text
87
+ - text
88
+ - - dump/22k/raw/train-clean-460/wav.scp
89
+ - speech
90
+ - sound
91
+ - - dump/22k/xvector/train-clean-460/xvector.scp
92
+ - spembs
93
+ - kaldi_ark
94
+ valid_data_path_and_name_and_type:
95
+ - - dump/22k/raw/dev-clean/text
96
+ - text
97
+ - text
98
+ - - dump/22k/raw/dev-clean/wav.scp
99
+ - speech
100
+ - sound
101
+ - - dump/22k/xvector/dev-clean/xvector.scp
102
+ - spembs
103
+ - kaldi_ark
104
+ allow_variable_data_keys: false
105
+ max_cache_size: 0.0
106
+ max_cache_fd: 32
107
+ valid_max_cache_size: null
108
+ optim: adamw
109
+ optim_conf:
110
+ lr: 0.0002
111
+ betas:
112
+ - 0.8
113
+ - 0.99
114
+ eps: 1.0e-09
115
+ weight_decay: 0.0
116
+ scheduler: exponentiallr
117
+ scheduler_conf:
118
+ gamma: 0.999875
119
+ optim2: adamw
120
+ optim2_conf:
121
+ lr: 0.0002
122
+ betas:
123
+ - 0.8
124
+ - 0.99
125
+ eps: 1.0e-09
126
+ weight_decay: 0.0
127
+ scheduler2: exponentiallr
128
+ scheduler2_conf:
129
+ gamma: 0.999875
130
+ generator_first: false
131
+ token_list:
132
+ - <blank>
133
+ - <unk>
134
+ - AH0
135
+ - T
136
+ - N
137
+ - D
138
+ - S
139
+ - R
140
+ - L
141
+ - IH1
142
+ - DH
143
+ - M
144
+ - K
145
+ - Z
146
+ - EH1
147
+ - AE1
148
+ - IH0
149
+ - AH1
150
+ - W
151
+ - ','
152
+ - HH
153
+ - ER0
154
+ - P
155
+ - IY1
156
+ - V
157
+ - F
158
+ - B
159
+ - UW1
160
+ - AA1
161
+ - AY1
162
+ - AO1
163
+ - .
164
+ - EY1
165
+ - IY0
166
+ - OW1
167
+ - NG
168
+ - G
169
+ - SH
170
+ - Y
171
+ - AW1
172
+ - CH
173
+ - ER1
174
+ - UH1
175
+ - TH
176
+ - JH
177
+ - ''''
178
+ - '?'
179
+ - OW0
180
+ - EH2
181
+ - '!'
182
+ - IH2
183
+ - OY1
184
+ - EY2
185
+ - AY2
186
+ - EH0
187
+ - UW0
188
+ - AA2
189
+ - AE2
190
+ - OW2
191
+ - AO2
192
+ - AE0
193
+ - AH2
194
+ - ZH
195
+ - AA0
196
+ - UW2
197
+ - IY2
198
+ - AY0
199
+ - AO0
200
+ - AW2
201
+ - EY0
202
+ - UH2
203
+ - ER2
204
+ - AW0
205
+ - '...'
206
+ - UH0
207
+ - OY2
208
+ - . . .
209
+ - OY0
210
+ - . . . .
211
+ - ..
212
+ - . ...
213
+ - . .
214
+ - . . . . .
215
+ - .. ..
216
+ - '... .'
217
+ - <sos/eos>
218
+ odim: null
219
+ model_conf: {}
220
+ use_preprocessor: true
221
+ token_type: phn
222
+ bpemodel: null
223
+ non_linguistic_symbols: null
224
+ cleaner: tacotron
225
+ g2p: g2p_en_no_space
226
+ feats_extract: linear_spectrogram
227
+ feats_extract_conf:
228
+ n_fft: 1024
229
+ hop_length: 256
230
+ win_length: null
231
+ normalize: null
232
+ normalize_conf: {}
233
+ tts: vits
234
+ tts_conf:
235
+ generator_type: vits_generator
236
+ generator_params:
237
+ hidden_channels: 192
238
+ spks: -1
239
+ spk_embed_dim: 512
240
+ global_channels: 256
241
+ segment_size: 32
242
+ text_encoder_attention_heads: 2
243
+ text_encoder_ffn_expand: 4
244
+ text_encoder_blocks: 6
245
+ text_encoder_positionwise_layer_type: conv1d
246
+ text_encoder_positionwise_conv_kernel_size: 3
247
+ text_encoder_positional_encoding_layer_type: rel_pos
248
+ text_encoder_self_attention_layer_type: rel_selfattn
249
+ text_encoder_activation_type: swish
250
+ text_encoder_normalize_before: true
251
+ text_encoder_dropout_rate: 0.1
252
+ text_encoder_positional_dropout_rate: 0.0
253
+ text_encoder_attention_dropout_rate: 0.1
254
+ use_macaron_style_in_text_encoder: true
255
+ use_conformer_conv_in_text_encoder: false
256
+ text_encoder_conformer_kernel_size: -1
257
+ decoder_kernel_size: 7
258
+ decoder_channels: 512
259
+ decoder_upsample_scales:
260
+ - 8
261
+ - 8
262
+ - 2
263
+ - 2
264
+ decoder_upsample_kernel_sizes:
265
+ - 16
266
+ - 16
267
+ - 4
268
+ - 4
269
+ decoder_resblock_kernel_sizes:
270
+ - 3
271
+ - 7
272
+ - 11
273
+ decoder_resblock_dilations:
274
+ - - 1
275
+ - 3
276
+ - 5
277
+ - - 1
278
+ - 3
279
+ - 5
280
+ - - 1
281
+ - 3
282
+ - 5
283
+ use_weight_norm_in_decoder: true
284
+ posterior_encoder_kernel_size: 5
285
+ posterior_encoder_layers: 16
286
+ posterior_encoder_stacks: 1
287
+ posterior_encoder_base_dilation: 1
288
+ posterior_encoder_dropout_rate: 0.0
289
+ use_weight_norm_in_posterior_encoder: true
290
+ flow_flows: 4
291
+ flow_kernel_size: 5
292
+ flow_base_dilation: 1
293
+ flow_layers: 4
294
+ flow_dropout_rate: 0.0
295
+ use_weight_norm_in_flow: true
296
+ use_only_mean_in_flow: true
297
+ stochastic_duration_predictor_kernel_size: 3
298
+ stochastic_duration_predictor_dropout_rate: 0.5
299
+ stochastic_duration_predictor_flows: 4
300
+ stochastic_duration_predictor_dds_conv_layers: 3
301
+ vocabs: 86
302
+ aux_channels: 513
303
+ discriminator_type: hifigan_multi_scale_multi_period_discriminator
304
+ discriminator_params:
305
+ scales: 1
306
+ scale_downsample_pooling: AvgPool1d
307
+ scale_downsample_pooling_params:
308
+ kernel_size: 4
309
+ stride: 2
310
+ padding: 2
311
+ scale_discriminator_params:
312
+ in_channels: 1
313
+ out_channels: 1
314
+ kernel_sizes:
315
+ - 15
316
+ - 41
317
+ - 5
318
+ - 3
319
+ channels: 128
320
+ max_downsample_channels: 1024
321
+ max_groups: 16
322
+ bias: true
323
+ downsample_scales:
324
+ - 2
325
+ - 2
326
+ - 4
327
+ - 4
328
+ - 1
329
+ nonlinear_activation: LeakyReLU
330
+ nonlinear_activation_params:
331
+ negative_slope: 0.1
332
+ use_weight_norm: true
333
+ use_spectral_norm: false
334
+ follow_official_norm: false
335
+ periods:
336
+ - 2
337
+ - 3
338
+ - 5
339
+ - 7
340
+ - 11
341
+ period_discriminator_params:
342
+ in_channels: 1
343
+ out_channels: 1
344
+ kernel_sizes:
345
+ - 5
346
+ - 3
347
+ channels: 32
348
+ downsample_scales:
349
+ - 3
350
+ - 3
351
+ - 3
352
+ - 3
353
+ - 1
354
+ max_downsample_channels: 1024
355
+ bias: true
356
+ nonlinear_activation: LeakyReLU
357
+ nonlinear_activation_params:
358
+ negative_slope: 0.1
359
+ use_weight_norm: true
360
+ use_spectral_norm: false
361
+ generator_adv_loss_params:
362
+ average_by_discriminators: false
363
+ loss_type: mse
364
+ discriminator_adv_loss_params:
365
+ average_by_discriminators: false
366
+ loss_type: mse
367
+ feat_match_loss_params:
368
+ average_by_discriminators: false
369
+ average_by_layers: false
370
+ include_final_outputs: true
371
+ mel_loss_params:
372
+ fs: 22050
373
+ n_fft: 1024
374
+ hop_length: 256
375
+ win_length: null
376
+ window: hann
377
+ n_mels: 80
378
+ fmin: 0
379
+ fmax: null
380
+ log_base: null
381
+ lambda_adv: 1.0
382
+ lambda_mel: 45.0
383
+ lambda_feat_match: 2.0
384
+ lambda_dur: 1.0
385
+ lambda_kl: 1.0
386
+ sampling_rate: 22050
387
+ cache_generator_outputs: true
388
+ pitch_extract: null
389
+ pitch_extract_conf: {}
390
+ pitch_normalize: null
391
+ pitch_normalize_conf: {}
392
+ energy_extract: null
393
+ energy_extract_conf: {}
394
+ energy_normalize: null
395
+ energy_normalize_conf: {}
396
+ required:
397
+ - output_dir
398
+ - token_list
399
+ version: 0.10.3a2
400
+ distributed: true
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_backward_time.png ADDED
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_fake_loss.png ADDED
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_forward_time.png ADDED
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_loss.png ADDED
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_optim_step_time.png ADDED
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_real_loss.png ADDED
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_train_time.png ADDED
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_adv_loss.png ADDED
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_backward_time.png ADDED
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_dur_loss.png ADDED
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_feat_match_loss.png ADDED
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_forward_time.png ADDED
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_kl_loss.png ADDED
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_loss.png ADDED
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_mel_loss.png ADDED
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_optim_step_time.png ADDED
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_train_time.png ADDED
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/gpu_max_cached_mem_GB.png ADDED
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/iter_time.png ADDED
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/optim0_lr0.png ADDED
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/optim1_lr0.png ADDED
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/images/train_time.png ADDED
exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/train.total_count.ave_10best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:05eb282aa88c7dfad30305cbb614589ebd9f0aec078fa6bf13befd5a5660ffff
3
+ size 386477966
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: 0.10.3a2
2
+ files:
3
+ model_file: exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/train.total_count.ave_10best.pth
4
+ python: "3.7.3 (default, Mar 27 2019, 22:11:17) \n[GCC 7.3.0]"
5
+ timestamp: 1632318084.168329
6
+ torch: 1.7.1
7
+ yaml_files:
8
+ train_config: exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space/config.yaml