|
--- |
|
language: sw |
|
license: cc-by-sa-4.0 |
|
tags: |
|
- audio |
|
- text-to-speech |
|
inference: false |
|
datasets: |
|
- bookbot/OpenBible_Swahili |
|
--- |
|
|
|
# VITS Base sw-KE-OpenBible |
|
|
|
VITS Base sw-KE-OpenBible is an end-to-end text-to-speech model based on the [VITS](https://arxiv.org/abs/2106.06103) architecture. This model was trained from scratch on a real audio dataset. The list of real speakers include: |
|
|
|
- sw-KE-OpenBible |
|
|
|
The model's [vocabulary](https://huggingface.co/bookbot/vits-base-sw-KE-OpenBible/blob/main/symbols.py) contains the different IPA phonemes found in [gruut](https://github.com/rhasspy/gruut). |
|
|
|
This model was trained using [VITS](https://github.com/jaywalnut310/vits) framework. All training was done on a Scaleway L40S VM with a NVIDIA L40S GPU. All necessary scripts used for training could be found in the [Files and versions](https://huggingface.co/bookbot/vits-base-sw-KE-OpenBible/tree/main) tab, as well as the [Training metrics](https://huggingface.co/bookbot/vits-base-sw-KE-OpenBible/tensorboard) logged via Tensorboard. |
|
|
|
## Model |
|
|
|
| Model | SR (Hz) | Mel range (Hz) | FFT / Hop / Win | #epochs | |
|
| ------------------------- | ------- | -------------- | ----------------- | ------- | |
|
| VITS Base sw-KE-OpenBible | 44.1K | 0-null | 2048 / 512 / 2048 | 12000 | |
|
|
|
## Training procedure |
|
|
|
### Prepare Data |
|
|
|
```sh |
|
python preprocess.py \ |
|
--text_index 1 \ |
|
--filelists filelists/sw-KE-OpenBible_text_train_filelist.txt filelists/sw-KE-OpenBible_text_val_filelist.txt \ |
|
--text_cleaners swahili_cleaners |
|
``` |
|
|
|
### Train |
|
|
|
```sh |
|
python train.py -c configs/sw_ke_openbible_base.json -m sw_ke_openbible_base |
|
``` |
|
|
|
## Frameworks |
|
|
|
- PyTorch 2.2.2 |
|
- [VITS](https://github.com/bookbot-hive/vits) |