Text-to-Speech (TTS) with Tacotron2 trained on Luganda CommonVoice

This repository provides all the necessary tools for Text-to-Speech (TTS) with SpeechBrain.

The pre-trained model takes in input a short text and produces a spectrogram in output. One can get the final waveform by applying a vocoder (e.g., HiFIGAN) on top of the generated spectrogram.

Install SpeechBrain

pip install speechbrain

Please notice that we encourage you to read our tutorials and learn more about SpeechBrain.

Perform Text-to-Speech (TTS)

import torchaudio
from speechbrain.inference.TTS import Tacotron2
from speechbrain.inference.vocoders import HIFIGAN

# Intialize TTS (tacotron2) and Vocoder (HiFIGAN)
tacotron2 = Tacotron2.from_hparams(source="sulaimank/tacotron2-cv-females", savedir="tmpdir_tts")
hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir="tmpdir_vocoder")

# Running the TTS
mel_output, mel_length, alignment = tacotron2.encode_text("Eddagala eryo lisigala mu nnyaanya okumala wiiki nga bbiri.")

# Running Vocoder (spectrogram-to-waveform)
waveforms = hifi_gan.decode_batch(mel_output)

# Save the waverform
torchaudio.save('example_TTS.wav',waveforms.squeeze(1), 22050)

If you want to generate multiple sentences in one-shot, you can do in this way:

from speechbrain.pretrained import Tacotron2
tacotron2 = Tacotron2.from_hparams(source="speechbrain/TTS_Tacotron2", savedir="tmpdir")
items = [
       "A quick brown fox jumped over the lazy dog",
       "How much wood would a woodchuck chuck?",
       "Never odd or even"
     ]
mel_outputs, mel_lengths, alignments = tacotron2.encode_batch(items)

### Limitations
The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.

sulaimank
/

tacotron2-cv-females

Text-to-Speech (TTS) with Tacotron2 trained on Luganda CommonVoice

Install SpeechBrain

Perform Text-to-Speech (TTS)

Model tree for sulaimank/tacotron2-cv-females

Dataset used to train sulaimank/tacotron2-cv-females