aioxlabs
/

hifigan-swahili

speech-synthesis

Model card Files Files and versions Community

nairaxo commited on Nov 2, 2022

Commit

3c35ca7

•

1 Parent(s): cfb4e67

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ datasets:
 # Vocoder with HiFIGAN trained on LJSpeech
-This repository provides all the necessary tools for using a [HiFIGAN](https://arxiv.org/abs/2010.05646) vocoder trained with [LJSpeech](https://keithito.com/LJ-Speech-Dataset/).
 The pre-trained model takes in input a spectrogram and produces a waveform in output. Typically, a vocoder is used after a TTS model that converts an input text into a spectrogram.
@@ -46,17 +46,17 @@ from speechbrain.pretrained import Tacotron2
 from speechbrain.pretrained import HIFIGAN
 # Intialize TTS (tacotron2) and Vocoder (HiFIGAN)
-tacotron2 = Tacotron2.from_hparams(source="speechbrain/tts-tacotron2-ljspeech", savedir="tmpdir_tts")
-hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir="tmpdir_vocoder")
 # Running the TTS
-mel_output, mel_length, alignment = tacotron2.encode_text("Mary had a little lamb")
 # Running Vocoder (spectrogram-to-waveform)
 waveforms = hifi_gan.decode_batch(mel_output)
 # Save the waverform
-torchaudio.save('example_TTS.wav',waveforms.squeeze(1), 22050)
 ```
 ### Inference on GPU

 # Vocoder with HiFIGAN trained on LJSpeech
+This repository provides all the necessary tools for using a [ALLFA Public](https://github.com/getalp/ALFFA_PUBLIC/tree/master/ASR/SWAHILI).
 The pre-trained model takes in input a spectrogram and produces a waveform in output. Typically, a vocoder is used after a TTS model that converts an input text into a spectrogram.
 from speechbrain.pretrained import HIFIGAN
 # Intialize TTS (tacotron2) and Vocoder (HiFIGAN)
+tacotron2 = Tacotron2.from_hparams(source="aioxlabs/tacotron-swahili", savedir="tmpdir_tts")
+hifi_gan = HIFIGAN.from_hparams(source="aioxlabs/hifigan-swahili", savedir="tmpdir_vocoder")
 # Running the TTS
+mel_output, mel_length, alignment = tacotron2.encode_text("raisi wa jumhuri ya tanzania")
 # Running Vocoder (spectrogram-to-waveform)
 waveforms = hifi_gan.decode_batch(mel_output)
 # Save the waverform
+torchaudio.save('example_TTS.wav',waveforms.squeeze(1), 16000)
 ```
 ### Inference on GPU