Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,69 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: ko
|
3 |
+
tags:
|
4 |
+
- text-to-speech
|
5 |
+
license: other
|
6 |
+
---
|
7 |
+
|
8 |
+
# Torchaudio_Tacotron2_kss
|
9 |
+
|
10 |
+
torchaudio [Tacotron2](https://pytorch.org/audio/stable/generated/torchaudio.models.Tacotron2.html#torchaudio.models.Tacotron2) model, trained on kss dataset.
|
11 |
+
|
12 |
+
## License
|
13 |
+
|
14 |
+
- code: MIT License
|
15 |
+
- `pytorch_model.bin` weights: CC BY-NC-SA 4.0 (license of the kss dataset)
|
16 |
+
|
17 |
+
## Requirements
|
18 |
+
|
19 |
+
```sh
|
20 |
+
pip install torch torchaudio transformers phonemizer
|
21 |
+
```
|
22 |
+
|
23 |
+
and you have to install [`espeak-ng`](https://github.com/espeak-ng/espeak-ng)
|
24 |
+
|
25 |
+
If you are using Windows, you need to set additional environment variables. see: <https://github.com/bootphon/phonemizer/issues/44>
|
26 |
+
|
27 |
+
## Usage
|
28 |
+
|
29 |
+
```python
|
30 |
+
import torch
|
31 |
+
from transformers import AutoModel, AutoTokenizer
|
32 |
+
|
33 |
+
repo = "Bingsu/torchaudio_tacotron2_kss"
|
34 |
+
model = AutoModel.from_pretrained(
|
35 |
+
repo,
|
36 |
+
trust_remote_code=True,
|
37 |
+
revision="589d6557e8b4bb347f49de74270541063ba9c2bc"
|
38 |
+
)
|
39 |
+
tokenizer = AutoTokenizer.from_pretrained(repo)
|
40 |
+
model.eval()
|
41 |
+
```
|
42 |
+
|
43 |
+
```python
|
44 |
+
vocoder = torch.hub.load("seungwonpark/melgan:aca59909f6dd028ec808f987b154535a7ca3400c", "melgan", trust_repo=True, pretrained=False)
|
45 |
+
url = "https://huggingface.co/Bingsu/torchaudio_tacotron2_kss/resolve/main/melgan.pt"
|
46 |
+
state_dict = torch.hub.load_state_dict_from_url(url)
|
47 |
+
vocoder.load_state_dict(state_dict)
|
48 |
+
```
|
49 |
+
|
50 |
+
vocoder is same as original [seungwonpark/melgan](https://github.com/seungwonpark/melgan), but the weights are on the cuda, so I brought them separately.
|
51 |
+
|
52 |
+
```python
|
53 |
+
text = "๋ฐ๊ฐ์ต๋๋ค ํ์ฝํธ๋ก 2์
๋๋ค."
|
54 |
+
inp = tokenizer(text, return_tensors="pt", return_length=True, return_attention_mask=False)
|
55 |
+
```
|
56 |
+
|
57 |
+
```python
|
58 |
+
with torch.inference_mode():
|
59 |
+
out = model(**inp)
|
60 |
+
audio = vocoder(out[0])
|
61 |
+
```
|
62 |
+
|
63 |
+
```python
|
64 |
+
import IPython.display as ipd
|
65 |
+
|
66 |
+
ipd.Audio(audio[0].numpy(), rate=22050)
|
67 |
+
```
|
68 |
+
|
69 |
+
<audio src="https://huggingface.co/Bingsu/torchaudio_tacotron2_kss/resolve/main/examples/sample1.wav" controls>
|