slegroux commited on
Commit
5bb2e7d
1 Parent(s): d9407e4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -0
README.md CHANGED
@@ -8,7 +8,47 @@ language: en
8
  We use a recurrent sequence-to-sequence Mel-spectrogram prediction network based on Google's Tacotron2 as a baseline. It achieves good performance, has been tested in multiple different contexts and gives a very realistic rendering of a speaker's characteristics. On the vocoder side we replaced Google's original Wavenet synthesizer with the recent Hifi-Gan vocoder for improved realistic speech prosody and faster training.
9
 
10
  ### How to use
 
11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
  ## Training data
14
 
 
8
  We use a recurrent sequence-to-sequence Mel-spectrogram prediction network based on Google's Tacotron2 as a baseline. It achieves good performance, has been tested in multiple different contexts and gives a very realistic rendering of a speaker's characteristics. On the vocoder side we replaced Google's original Wavenet synthesizer with the recent Hifi-Gan vocoder for improved realistic speech prosody and faster training.
9
 
10
  ### How to use
11
+ Usage based on Maui python library
12
 
13
+ ```python
14
+
15
+ # IMPORTS
16
+ import json
17
+ from maui.utils.tacotron2 import text2seq
18
+ from maui.utils.hifigan import AttrDict
19
+ from maui.models.hifigan import Generator
20
+ from maui.models.tacotron2 import Tacotron2
21
+
22
+ # PATHS
23
+ MODEL_DIR = "models"
24
+ TACO_CONF = os.path.join(MODEL_DIR, "maui-tacotron2.yaml")
25
+ OBAMA_CKPT = os.path.join(MODEL_DIR, "obama", "checkpoint_9000")
26
+ HIFIGAN_CKPT = os.path.join(MODEL_DIR, "hifigan", "UNIVERSAL_V1", "g_02500000")
27
+ HIFIGAN_CONF = os.path.join(MODEL_DIR,"hifigan", "UNIVERSAL_V1","config.json")
28
+
29
+ # DEVICE FOR INFERENCE
30
+ device = torch.device('cuda')
31
+ #device = torch.device('cpu')
32
+
33
+ # MODEL SETUP
34
+ taco_cfg = OmegaConf.load(TACO_CONF)
35
+ taco = Tacotron2(taco_cfg.model).to(device).share_memory()
36
+ taco = taco.setup_inference(OBAMA_CKPT, device=device)
37
+ with open(HIFIGAN_CONF, 'r') as f:
38
+ data = f.read()
39
+ hifigan_cfg = AttrDict(json.loads(data))
40
+ hifigan = Generator(hifigan_cfg).to(device).share_memory()
41
+ hifigan = hifigan.setup_inference(HIFIGAN_CKPT, device=device)
42
+
43
+ # TEXT NORMALIZATION
44
+ text = "This is the sentence that will be spoken in the voice of Pr. Obama"
45
+ sequence = text2seq(text, device=device)
46
+
47
+ # INFERENCE
48
+ _, mel_outputs_postnet, _, _ = taco.inference(sequence)
49
+ audio = hifigan.inference(mel_outputs_postnet, device=device)
50
+
51
+ ```
52
 
53
  ## Training data
54