StyleTTS2 Fine-tuned Model

This model is a fine-tuned version of StyleTTS2.

Model Details

  • Base Model: StyleTTS2-LibriTTS
  • Architecture: StyleTTS2
  • Task: Text-to-Speech
  • Last Checkpoint: epoch_2nd_00004.pth

Training Details

  • Total Epochs: 5
  • Completed Epochs: 4
  • Total Iterations: 411
  • Batch Size: 2
  • Max Length: 120
  • Learning Rate: 0.0001
  • Final Validation Loss: 0.430844

Loss Parameters

  • Diff Epoch: 10
  • Joint Epoch: 110
  • Lambda Parameters:
    • Mel: 5.0
    • F0: 1.0
    • Duration: 1.0
    • Style: 1.0

Model Components

  • bert
  • bert_encoder
  • predictor
  • decoder
  • text_encoder
  • predictor_encoder
  • style_encoder
  • diffusion
  • text_aligner
  • pitch_extractor
  • mpd
  • msd
  • wd

Training Metrics

Training metrics visualization is available in training_metrics.png

Downloads last month
13
Inference Examples
Unable to determine this model's library. Check the docs .