projecte-aina
/

matxa-tts-cat-multiaccent

@@ -3,18 +3,17 @@ language:
 - ca
 licence:
 - apache-2.0
 tags:
 - matcha-tts
 - acoustic modelling
 - speech
 - multispeaker
 pipeline_tag: text-to-speech
-datasets:
-- projecte-aina/festcat_trimmed_denoised
-- projecte-aina/openslr-slr69-ca-trimmed-denoised
 ---
-# Matcha-TTS Catalan Multispeaker
 ## Table of Contents
 <details>
@@ -53,7 +52,7 @@ This may be due to the sensitivity of the model in learning specific frequencies
 ### Installation
-This model has been trained using the espeak-ng open source text-to-speech software.
 The espeak-ng containing the Catalan phonemizer can be found [here](https://github.com/projecte-aina/espeak-ng)
 Create a virtual environment:
@@ -123,11 +122,10 @@ python3 matcha_vocos_inference.py --output_path=/output/path --text_input="Bon d
 #### ONNX
-We also release a ONNX version of the model
 ### For Training
-The entire checkpoint is also released to continue training or finetuning.
 See the [repo instructions](https://github.com/langtech-bsc/Matcha-TTS/tree/dev-cat)
@@ -135,25 +133,23 @@ See the [repo instructions](https://github.com/langtech-bsc/Matcha-TTS/tree/dev-
 ### Training data
-The model was trained on 2 **Catalan** speech datasets
 | Dataset             | Language | Hours   | Num. Speakers   |
 |---------------------|----------|---------|-----------------|
-| [Festcat](https://huggingface.co/datasets/projecte-aina/festcat_trimmed_denoised)             | ca       | 22      | 11              |
-| [OpenSLR69](https://huggingface.co/datasets/projecte-aina/openslr-slr69-ca-trimmed-denoised)           | ca       | 5       | 36              |
 ### Training procedure
-***Catalan Matcha-TTS*** was finetuned from the English multispeaker checkpoint,
-which was trained with the [VCTK dataset](https://huggingface.co/datasets/vctk) and provided by the model authors.
-The embedding layer was initialized with the number of catalan speakers (47) and the original hyperparameters were kept.
 ### Training Hyperparameters
 * batch size: 32 (x2 GPUs)
 * learning rate: 1e-4
-* number of speakers: 47
 * n_fft: 1024
 * n_feats: 80
 * sample_rate: 22050
@@ -174,7 +170,7 @@ Validation values obtained from tensorboard from epoch 2399*:
 * val_prior_loss_epoch: 0.97
 * val_diff_loss_epoch: 2.195
-(Note that the finetuning started from epoch 1864, as previous ones were trained with VCTK dataset)
 ## Citation

 - ca
 licence:
 - apache-2.0
+base_model: BSC-LT/matcha-tts-cat-multispeaker
 tags:
 - matcha-tts
 - acoustic modelling
 - speech
 - multispeaker
 pipeline_tag: text-to-speech
 ---
+# Matcha-TTS Catalan Multiaccent
 ## Table of Contents
 <details>
 ### Installation
+Models have been trained using the espeak-ng open source text-to-speech software.
 The espeak-ng containing the Catalan phonemizer can be found [here](https://github.com/projecte-aina/espeak-ng)
 Create a virtual environment:
 #### ONNX
+We also release ONNXs version of the models
 ### For Training
 See the [repo instructions](https://github.com/langtech-bsc/Matcha-TTS/tree/dev-cat)
 ### Training data
+The model was trained on a **Multiaccent Catalan** speech dataset
 | Dataset             | Language | Hours   | Num. Speakers   |
 |---------------------|----------|---------|-----------------|
+| [Lafrescat comming soon]---()             | ca       | 3.5      | 8              |
 ### Training procedure
+***Multiaccent Catalan Matcha-TTS*** was finetuned from a catalan central [multispeaker checkpoint](https://huggingface.co/BSC-LT/matcha-tts-cat-multispeaker),
+The embedding layer was initialized with the number of catalan speakers per accent (2) and the original hyperparameters were kept.
 ### Training Hyperparameters
 * batch size: 32 (x2 GPUs)
 * learning rate: 1e-4
+* number of speakers: 2
 * n_fft: 1024
 * n_feats: 80
 * sample_rate: 22050
 * val_prior_loss_epoch: 0.97
 * val_diff_loss_epoch: 2.195
 ## Citation