Text-to-Speech (TTS) with Tacotron2 trained on Luganda CommonVoice
This repository provides all the necessary tools for Text-to-Speech (TTS) with SpeechBrain.
The pre-trained model takes in input a short text and produces a spectrogram in output. One can get the final waveform by applying a vocoder (e.g., HiFIGAN) on top of the generated spectrogram.
Install SpeechBrain
pip install speechbrain
Please notice that we encourage you to read our tutorials and learn more about SpeechBrain.
Perform Text-to-Speech (TTS)
import torchaudio
from speechbrain.inference.TTS import Tacotron2
from speechbrain.inference.vocoders import HIFIGAN
# Intialize TTS (tacotron2) and Vocoder (HiFIGAN)
tacotron2 = Tacotron2.from_hparams(source="sulaimank/tacotron2-cv-females", savedir="tmpdir_tts")
hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir="tmpdir_vocoder")
# Running the TTS
mel_output, mel_length, alignment = tacotron2.encode_text("Eddagala eryo lisigala mu nnyaanya okumala wiiki nga bbiri.")
# Running Vocoder (spectrogram-to-waveform)
waveforms = hifi_gan.decode_batch(mel_output)
# Save the waverform
torchaudio.save('example_TTS.wav',waveforms.squeeze(1), 22050)
If you want to generate multiple sentences in one-shot, you can do in this way:
from speechbrain.pretrained import Tacotron2
tacotron2 = Tacotron2.from_hparams(source="speechbrain/TTS_Tacotron2", savedir="tmpdir")
items = [
"A quick brown fox jumped over the lazy dog",
"How much wood would a woodchuck chuck?",
"Never odd or even"
]
mel_outputs, mel_lengths, alignments = tacotron2.encode_batch(items)
### Limitations
The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
- Downloads last month
- 5
Model tree for sulaimank/tacotron2-cv-females
Base model
speechbrain/tts-tacotron2-ljspeech