File size: 1,906 Bytes
0a42ee9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
---
language: ko
tags:
- text-to-speech
license: other
---
# Torchaudio_Tacotron2_kss
torchaudio [Tacotron2](https://pytorch.org/audio/stable/generated/torchaudio.models.Tacotron2.html#torchaudio.models.Tacotron2) model, trained on kss dataset.
## License
- code: MIT License
- `pytorch_model.bin` weights: CC BY-NC-SA 4.0 (license of the kss dataset)
## Requirements
```sh
pip install torch torchaudio transformers phonemizer
```
and you have to install [`espeak-ng`](https://github.com/espeak-ng/espeak-ng)
If you are using Windows, you need to set additional environment variables. see: <https://github.com/bootphon/phonemizer/issues/44>
## Usage
```python
import torch
from transformers import AutoModel, AutoTokenizer
repo = "Bingsu/torchaudio_tacotron2_kss"
model = AutoModel.from_pretrained(
repo,
trust_remote_code=True,
revision="589d6557e8b4bb347f49de74270541063ba9c2bc"
)
tokenizer = AutoTokenizer.from_pretrained(repo)
model.eval()
```
```python
vocoder = torch.hub.load("seungwonpark/melgan:aca59909f6dd028ec808f987b154535a7ca3400c", "melgan", trust_repo=True, pretrained=False)
url = "https://huggingface.co/Bingsu/torchaudio_tacotron2_kss/resolve/main/melgan.pt"
state_dict = torch.hub.load_state_dict_from_url(url)
vocoder.load_state_dict(state_dict)
```
vocoder is same as original [seungwonpark/melgan](https://github.com/seungwonpark/melgan), but the weights are on the cuda, so I brought them separately.
```python
text = "๋ฐ๊ฐ์ต๋๋ค ํ์ฝํธ๋ก 2์
๋๋ค."
inp = tokenizer(text, return_tensors="pt", return_length=True, return_attention_mask=False)
```
```python
with torch.inference_mode():
out = model(**inp)
audio = vocoder(out[0])
```
```python
import IPython.display as ipd
ipd.Audio(audio[0].numpy(), rate=22050)
```
<audio src="https://huggingface.co/Bingsu/torchaudio_tacotron2_kss/resolve/main/examples/sample1.wav" controls>
|