lopezjm96
/

spanishVITS

Inference Endpoints

Model card Files Files and versions Community

spanishVITS / HOW I GOT THIS.txt

lopezjm96's picture

First attempt

40e0360 about 2 years ago

history blame contribute delete

1.48 kB

	Using COQUI-AI TTS repo https://github.com/coqui-ai/TTS
	the default spanish VITS model is in .tar format, so I used the finetuning
	functionality but with only 1 epoch and all learning rates in 0.0

	- Clone repo
	- follow instructions for installation, currently doing cd /repo/path && pip install -e .[all,dev,notebooks]
	- Generate any text with vits spanish model to download the model, currenty: tts --model_name tts/es/cs100/vits --text "hola hola hola" --out_path /anywhere/
	- Go to directory of downloaded model, normally /user/.local/tts/vits
	- copy config.json and model.pth.tar
	- customize config.json to train only 1 epoch and all lr to 0.0 (including lr_gen & lr_disc) and settings for your database, even when you don't want to train, you need some dataset to act as if it was going to train
	- finetune, currently: CUDA_VISIBLE_DEVICES="0" python /path/to/repo/TTS/TTS/bin/train_tts.py --config_path /path/to/custom/config/config.json --restore_path /path/to/model/model_file.pth.tar --use_cuda True

	Some troubles I found and how I solved them:

	- If training throws error of "Vits has no disc", on confing.json set initialize_disc to true.
	- Always needs at least one file for evaluation, so set eval_split_size so that it gets at least one file, for me it crashed when using no evaluation.
	- If throws some error about no more data, check the data filters in config.json, such as min and max length. It could be that all your audio and/or text data is being filtered out.