cisco-ai
/

tts-maui-obama

Model card Files Files and versions Community

slegroux commited on Aug 25, 2022

Commit

5bb2e7d

•

1 Parent(s): d9407e4

Update README.md

Files changed (1) hide show

README.md +40 -0

README.md CHANGED Viewed

@@ -8,7 +8,47 @@ language: en
 We use a recurrent sequence-to-sequence Mel-spectrogram prediction network based on Google's Tacotron2 as a baseline. It achieves good performance, has been tested in multiple different contexts and gives a very realistic rendering of a speaker's characteristics. On the vocoder side we replaced Google's original Wavenet synthesizer with the recent Hifi-Gan vocoder for improved realistic speech prosody and faster training.
 ### How to use
 ## Training data

 We use a recurrent sequence-to-sequence Mel-spectrogram prediction network based on Google's Tacotron2 as a baseline. It achieves good performance, has been tested in multiple different contexts and gives a very realistic rendering of a speaker's characteristics. On the vocoder side we replaced Google's original Wavenet synthesizer with the recent Hifi-Gan vocoder for improved realistic speech prosody and faster training.
 ### How to use
+Usage based on Maui python library
+```python
+# IMPORTS
+import json
+from maui.utils.tacotron2 import text2seq
+from maui.utils.hifigan import AttrDict
+from maui.models.hifigan import Generator
+from maui.models.tacotron2 import Tacotron2
+# PATHS
+MODEL_DIR = "models"
+TACO_CONF = os.path.join(MODEL_DIR, "maui-tacotron2.yaml")
+OBAMA_CKPT = os.path.join(MODEL_DIR, "obama", "checkpoint_9000")
+HIFIGAN_CKPT = os.path.join(MODEL_DIR, "hifigan", "UNIVERSAL_V1", "g_02500000")
+HIFIGAN_CONF = os.path.join(MODEL_DIR,"hifigan", "UNIVERSAL_V1","config.json")
+# DEVICE FOR INFERENCE
+device = torch.device('cuda')
+#device = torch.device('cpu')
+# MODEL SETUP
+taco_cfg = OmegaConf.load(TACO_CONF)
+taco = Tacotron2(taco_cfg.model).to(device).share_memory()
+taco = taco.setup_inference(OBAMA_CKPT, device=device)
+with open(HIFIGAN_CONF, 'r') as f:
+    data = f.read()
+hifigan_cfg = AttrDict(json.loads(data))
+hifigan = Generator(hifigan_cfg).to(device).share_memory()
+hifigan = hifigan.setup_inference(HIFIGAN_CKPT, device=device)
+# TEXT NORMALIZATION
+text = "This is the sentence that will be spoken in the voice of Pr. Obama"
+sequence = text2seq(text, device=device)
+# INFERENCE
+_, mel_outputs_postnet, _, _ = taco.inference(sequence)
+audio = hifigan.inference(mel_outputs_postnet, device=device)
+```
 ## Training data