|
--- |
|
datasets: |
|
- facebook/multilingual_librispeech |
|
language: |
|
- it |
|
base_model: |
|
- SWivid/F5-TTS |
|
pipeline_tag: text-to-speech |
|
license: cc-by-4.0 |
|
library_name: f5-tts |
|
--- |
|
|
|
This is an Italian finetune for F5-TTS |
|
|
|
> # UPDATE: |
|
> # A better version with improved prosody here => https://huggingface.co/alien79/F5-TTS-italian * |
|
|
|
Italian only so can't speak english properly |
|
|
|
Trained over 247+h hours of "train" split of facebook/multilingual_librispeech dataset, 6717 steps for Epoch: |
|
- catastrophic failure (the model forgot english) |
|
- italian pronunciation not perfect (there are lot of checkpoints to let you play with and extend training, maybe with different datasets) |
|
|
|
# Current most trained model |
|
italian_59kh/model_464400.safetensors (~70 Epoch) |
|
|
|
## folder structure: |
|
``` |
|
| - italian_59kh |
|
| | - checkpoints |
|
``` |
|
|
|
### italian_59kh |
|
Contains the weight at specific steps, the higher the number, the further it went into training. |
|
Weights in this folder can't be used to resume training, use checkpoints instead. |
|
|
|
### italian_59kh/checkpoints |
|
Contains the weight of the checkpoints at specific steps, the higher the number, the further it went into training. |
|
Weights in this folder can be used as starting point to continue training. |
|
|
|
|
|
|
|
The run.py file is an example of how to extract the wav files and produce the metadata.csv to use for training |