File size: 303 Bytes
ef4d549 |
1 2 3 4 5 |
Pre-trained Bert-VITS2-2.2-CLAP model. This model is trained with Chinese / Japanese / English speech data, the speaker embedding is removed due to copyright issues. You can use auxiliary data to fine-tune the model to avoid catastrophic forgetting of multi-language ability and CLAP control ability. |