Bingsu commited on
Commit
0a42ee9
โ€ข
1 Parent(s): b79620c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -0
README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: ko
3
+ tags:
4
+ - text-to-speech
5
+ license: other
6
+ ---
7
+
8
+ # Torchaudio_Tacotron2_kss
9
+
10
+ torchaudio [Tacotron2](https://pytorch.org/audio/stable/generated/torchaudio.models.Tacotron2.html#torchaudio.models.Tacotron2) model, trained on kss dataset.
11
+
12
+ ## License
13
+
14
+ - code: MIT License
15
+ - `pytorch_model.bin` weights: CC BY-NC-SA 4.0 (license of the kss dataset)
16
+
17
+ ## Requirements
18
+
19
+ ```sh
20
+ pip install torch torchaudio transformers phonemizer
21
+ ```
22
+
23
+ and you have to install [`espeak-ng`](https://github.com/espeak-ng/espeak-ng)
24
+
25
+ If you are using Windows, you need to set additional environment variables. see: <https://github.com/bootphon/phonemizer/issues/44>
26
+
27
+ ## Usage
28
+
29
+ ```python
30
+ import torch
31
+ from transformers import AutoModel, AutoTokenizer
32
+
33
+ repo = "Bingsu/torchaudio_tacotron2_kss"
34
+ model = AutoModel.from_pretrained(
35
+ repo,
36
+ trust_remote_code=True,
37
+ revision="589d6557e8b4bb347f49de74270541063ba9c2bc"
38
+ )
39
+ tokenizer = AutoTokenizer.from_pretrained(repo)
40
+ model.eval()
41
+ ```
42
+
43
+ ```python
44
+ vocoder = torch.hub.load("seungwonpark/melgan:aca59909f6dd028ec808f987b154535a7ca3400c", "melgan", trust_repo=True, pretrained=False)
45
+ url = "https://huggingface.co/Bingsu/torchaudio_tacotron2_kss/resolve/main/melgan.pt"
46
+ state_dict = torch.hub.load_state_dict_from_url(url)
47
+ vocoder.load_state_dict(state_dict)
48
+ ```
49
+
50
+ vocoder is same as original [seungwonpark/melgan](https://github.com/seungwonpark/melgan), but the weights are on the cuda, so I brought them separately.
51
+
52
+ ```python
53
+ text = "๋ฐ˜๊ฐ‘์Šต๋‹ˆ๋‹ค ํƒ€์ฝ”ํŠธ๋ก 2์ž…๋‹ˆ๋‹ค."
54
+ inp = tokenizer(text, return_tensors="pt", return_length=True, return_attention_mask=False)
55
+ ```
56
+
57
+ ```python
58
+ with torch.inference_mode():
59
+ out = model(**inp)
60
+ audio = vocoder(out[0])
61
+ ```
62
+
63
+ ```python
64
+ import IPython.display as ipd
65
+
66
+ ipd.Audio(audio[0].numpy(), rate=22050)
67
+ ```
68
+
69
+ <audio src="https://huggingface.co/Bingsu/torchaudio_tacotron2_kss/resolve/main/examples/sample1.wav" controls>