speechbrain
/

tts-hifigan-libritts-16kHz

speech-synthesis

Model card Files Files and versions Community

speechbrainteam commited on Oct 10, 2023

Commit

3f31acd

·

1 Parent(s): 810cc65

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -36,6 +36,7 @@ Please notice that we encourage you to read our tutorials and learn more about
 ### Using the Vocoder
 ```python
 import torch
 from speechbrain.pretrained import HIFIGAN
@@ -46,13 +47,14 @@ mel_specs = torch.rand(2, 80,298)
 waveforms = hifi_gan.decode_batch(mel_specs)
 ```
 ```python
 import torchaudio
 from speechbrain.pretrained import HIFIGAN
 from speechbrain.lobes.models.FastSpeech2 import mel_spectogram
 # Load a pretrained HIFIGAN Vocoder
-hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-libritts-16kHz", savedir="tmpdir")
 # Load an audio file (an example file can be found in this repository)
 # Ensure that the audio signal is sampled at 16000 Hz; refer to the provided link for a 22050 Hz Vocoder.
@@ -89,7 +91,7 @@ waveforms = hifi_gan.decode_batch(spectrogram)
 # Save the reconstructed audio as a waveform
 torchaudio.save('waveform_reconstructed.wav', waveforms.squeeze(1), 16000)
-# If everything is set up correctly, the original and reconstructed audio should be nearly indistinguishable.
 ```

 ### Using the Vocoder
+- *Basic Usage:*
 ```python
 import torch
 from speechbrain.pretrained import HIFIGAN
 waveforms = hifi_gan.decode_batch(mel_specs)
 ```
+- *Spectrogram to Waveform Conversion:*
 ```python
 import torchaudio
 from speechbrain.pretrained import HIFIGAN
 from speechbrain.lobes.models.FastSpeech2 import mel_spectogram
 # Load a pretrained HIFIGAN Vocoder
+hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-libritts-16kHz", savedir="vocoder_16khz")
 # Load an audio file (an example file can be found in this repository)
 # Ensure that the audio signal is sampled at 16000 Hz; refer to the provided link for a 22050 Hz Vocoder.
 # Save the reconstructed audio as a waveform
 torchaudio.save('waveform_reconstructed.wav', waveforms.squeeze(1), 16000)
+# If everything is set up correctly, the original and reconstructed audio should be nearly indistinguishable
 ```