Models trained from VITS-fast-fine-tuning

  • Three speakers: laoliang (老梁), specialweek, zhongli.
  • The model is based on the C+J base model and trained on a single NVIDIA 3090 with 300 epochs. It takes about 4.5 hours in total.
  • During training, we use a single long audio of laoliang (~5 minutes) with auxiliary data as training data.

How to run the model?

  • Follow the official instruction, install required libraries.
  • Download models and move finetune_speaker.json and G_latest.pth to /path/to/ VITS-fast-fine-tuning.
  • Run python VC_inference.py --model_dir ./G_latest.pth --share True to start a local gradio inference demo.

File structure

VITS-fast-fine-tuning
├───VC_inference.py
├───...
├───finetune_speaker.json
└───G_latest.pth
Downloads last month
4
Inference API
Unable to determine this model’s pipeline type. Check the docs .