Alex changes
#9
by
martillopartbsc
- opened
about.md
CHANGED
@@ -207,13 +207,17 @@ Together, these technologies form a comprehensive TTS solution tailored to the n
|
|
207 |
|
208 |
## The model in detail
|
209 |
|
210 |
-
**Matcha-TTS** is
|
211 |
-
|
212 |
-
|
213 |
-
|
214 |
-
|
215 |
-
|
216 |
-
|
|
|
|
|
|
|
|
|
217 |
|
218 |
## Adaptation to Catalan
|
219 |
|
|
|
207 |
|
208 |
## The model in detail
|
209 |
|
210 |
+
**Matcha-TTS** is a non-autorregressive encoder-decoder model designed for fast acoustic modelling in TTS.
|
211 |
+
The encoder part processes input sequences of phonemes and, together with a phoneme duration predictor, outputs averaged acoustic features. And the decoder,
|
212 |
+
which is essentially a U-Net backbone based on the Transfomer architecture, predicts the refined spectrogram.
|
213 |
+
The model is trained with optimal-transport conditional flow matching.
|
214 |
+
This yields an ODE-based decoder capable of generating high output quality in fewer synthesis steps.
|
215 |
+
|
216 |
+
**Vocos** is a fast neural vocoder designed to synthesize audio waveforms from acoustic features.
|
217 |
+
Unlike other typical GAN-based vocoders, Vocos does not model audio samples in the time domain.
|
218 |
+
Instead, it generates spectral coefficients, facilitating rapid audio reconstruction through inverse Fourier transform.
|
219 |
+
The goal of this model is to provide an alternative to hifi-gan that is faster and compatible with the acoustic output of several TTS models.
|
220 |
+
This version is tailored for the Catalan language, as it was trained only on Catalan speech datasets.
|
221 |
|
222 |
## Adaptation to Catalan
|
223 |
|