Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,47 @@ language: en
|
|
8 |
We use a recurrent sequence-to-sequence Mel-spectrogram prediction network based on Google's Tacotron2 as a baseline. It achieves good performance, has been tested in multiple different contexts and gives a very realistic rendering of a speaker's characteristics. On the vocoder side we replaced Google's original Wavenet synthesizer with the recent Hifi-Gan vocoder for improved realistic speech prosody and faster training.
|
9 |
|
10 |
### How to use
|
|
|
11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
|
13 |
## Training data
|
14 |
|
|
|
8 |
We use a recurrent sequence-to-sequence Mel-spectrogram prediction network based on Google's Tacotron2 as a baseline. It achieves good performance, has been tested in multiple different contexts and gives a very realistic rendering of a speaker's characteristics. On the vocoder side we replaced Google's original Wavenet synthesizer with the recent Hifi-Gan vocoder for improved realistic speech prosody and faster training.
|
9 |
|
10 |
### How to use
|
11 |
+
Usage based on Maui python library
|
12 |
|
13 |
+
```python
|
14 |
+
|
15 |
+
# IMPORTS
|
16 |
+
import json
|
17 |
+
from maui.utils.tacotron2 import text2seq
|
18 |
+
from maui.utils.hifigan import AttrDict
|
19 |
+
from maui.models.hifigan import Generator
|
20 |
+
from maui.models.tacotron2 import Tacotron2
|
21 |
+
|
22 |
+
# PATHS
|
23 |
+
MODEL_DIR = "models"
|
24 |
+
TACO_CONF = os.path.join(MODEL_DIR, "maui-tacotron2.yaml")
|
25 |
+
OBAMA_CKPT = os.path.join(MODEL_DIR, "obama", "checkpoint_9000")
|
26 |
+
HIFIGAN_CKPT = os.path.join(MODEL_DIR, "hifigan", "UNIVERSAL_V1", "g_02500000")
|
27 |
+
HIFIGAN_CONF = os.path.join(MODEL_DIR,"hifigan", "UNIVERSAL_V1","config.json")
|
28 |
+
|
29 |
+
# DEVICE FOR INFERENCE
|
30 |
+
device = torch.device('cuda')
|
31 |
+
#device = torch.device('cpu')
|
32 |
+
|
33 |
+
# MODEL SETUP
|
34 |
+
taco_cfg = OmegaConf.load(TACO_CONF)
|
35 |
+
taco = Tacotron2(taco_cfg.model).to(device).share_memory()
|
36 |
+
taco = taco.setup_inference(OBAMA_CKPT, device=device)
|
37 |
+
with open(HIFIGAN_CONF, 'r') as f:
|
38 |
+
data = f.read()
|
39 |
+
hifigan_cfg = AttrDict(json.loads(data))
|
40 |
+
hifigan = Generator(hifigan_cfg).to(device).share_memory()
|
41 |
+
hifigan = hifigan.setup_inference(HIFIGAN_CKPT, device=device)
|
42 |
+
|
43 |
+
# TEXT NORMALIZATION
|
44 |
+
text = "This is the sentence that will be spoken in the voice of Pr. Obama"
|
45 |
+
sequence = text2seq(text, device=device)
|
46 |
+
|
47 |
+
# INFERENCE
|
48 |
+
_, mel_outputs_postnet, _, _ = taco.inference(sequence)
|
49 |
+
audio = hifigan.inference(mel_outputs_postnet, device=device)
|
50 |
+
|
51 |
+
```
|
52 |
|
53 |
## Training data
|
54 |
|