marconilab's picture
Update README.md
65a0906 verified
---
language:
- lg
- sw
tags:
- text-to-speech
- TTS
- speech-synthesis
- VITS
license: cc-by-4.0
datasets:
- mozilla-foundation/common_voice_13_0
pipeline_tag: text-to-speech
---
<iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
<br/><br/>
# Text-to-Speech (TTS) with VITS trained on Kiswahili and Luganda Common Voice
This repository provides all the necessary tools for Text-to-Speech (TTS) with Coqui TTS using a [VITS](https://arxiv.org/abs/2106.06103) fine-tuned on Kiswahili and Luganda Common Voice v13 from six speakers of a similar intonation.
The pre-trained model takes in as input a text and produces a waveform/audio in output.
# How to Synthesize Speech using our models
First, you need to install TTS
```
pip install TTS
```
### Perform Text-to-Speech (TTS)
```python
from TTS.utils.synthesizer import Synthesizer
synthesizer = Synthesizer(
"<model checkpoint path>",
"<model configuration file>",
None,
None,
None,
None,
None,
None,
None,
)
sentence_to_synthesize = "Your Kiswahili or Luganda sentence here"
if sentence_to_synthesize:
print(sentence_to_synthesize)
wav = synthesizer.tts(sentence_to_synthesize, None, None, None)
location = "output.wav" # Choose a desired name for the output file
synthesizer.save_wav(wav, location)
```
### Limitations
We do not provide any warranty on the performance achieved by this model when used on other datasets.
# **Citing**
Please, cite our work if you use our models for your research or business.
```bibtex
@inproceedings{buildingTTS,
title={Building a Luganda Text-to-Speech Model from Crowdsourced Data},
author={Kagumire, Sulaiman and Katumba, Andrew and Nakatumba-Nabende, Joyce and Quinn, John},
booktitle={5th Workshop on African Natural Language Processing},
year ={2024}
}
```