|
Using COQUI-AI TTS repo https://github.com/coqui-ai/TTS |
|
the default spanish VITS model is in .tar format, so I used the finetuning |
|
functionality but with only 1 epoch and all learning rates in 0.0 |
|
|
|
- Clone repo |
|
- follow instructions for installation, currently doing cd /repo/path && pip install -e .[all,dev,notebooks] |
|
- Generate any text with vits spanish model to download the model, currenty: tts --model_name tts/es/cs100/vits --text "hola hola hola" --out_path /anywhere/ |
|
- Go to directory of downloaded model, normally /user/.local/tts/vits |
|
- copy config.json and model.pth.tar |
|
- customize config.json to train only 1 epoch and all lr to 0.0 (including lr_gen & lr_disc) and settings for your database, even when you don't want to train, you need some dataset to act as if it was going to train |
|
- finetune, currently: CUDA_VISIBLE_DEVICES="0" python /path/to/repo/TTS/TTS/bin/train_tts.py --config_path /path/to/custom/config/config.json --restore_path /path/to/model/model_file.pth.tar --use_cuda True |
|
|
|
Some troubles I found and how I solved them: |
|
|
|
- If training throws error of "Vits has no disc", on confing.json set initialize_disc to true. |
|
- Always needs at least one file for evaluation, so set eval_split_size so that it gets at least one file, for me it crashed when using no evaluation. |
|
- If throws some error about no more data, check the data filters in config.json, such as min and max length. It could be that all your audio and/or text data is being filtered out. |