emoji added
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ pipeline_tag: text-to-speech
|
|
12 |
license: cc-by-nc-4.0
|
13 |
---
|
14 |
|
15 |
-
# Matxa-TTS (Matcha-TTS) Catalan Multiaccent
|
16 |
|
17 |
## Table of Contents
|
18 |
<details>
|
@@ -30,7 +30,7 @@ license: cc-by-nc-4.0
|
|
30 |
|
31 |
## Model Description
|
32 |
|
33 |
-
**Matxa-TTS** is based on **Matcha-TTS** that is an encoder-decoder architecture designed for fast acoustic modelling in TTS.
|
34 |
The encoder part is based on a text encoder and a phoneme duration prediction that together predict averaged acoustic features.
|
35 |
And the decoder has essentially a U-Net backbone inspired by [Grad-TTS](https://arxiv.org/pdf/2105.06337.pdf), which is based on the Transformer architecture.
|
36 |
In the latter, by replacing 2D CNNs by 1D CNNs, a large reduction in memory consumption and fast synthesis is achieved.
|
|
|
12 |
license: cc-by-nc-4.0
|
13 |
---
|
14 |
|
15 |
+
# 🍵 Matxa-TTS (Matcha-TTS) Catalan Multiaccent
|
16 |
|
17 |
## Table of Contents
|
18 |
<details>
|
|
|
30 |
|
31 |
## Model Description
|
32 |
|
33 |
+
🍵 **Matxa-TTS** is based on **Matcha-TTS** that is an encoder-decoder architecture designed for fast acoustic modelling in TTS.
|
34 |
The encoder part is based on a text encoder and a phoneme duration prediction that together predict averaged acoustic features.
|
35 |
And the decoder has essentially a U-Net backbone inspired by [Grad-TTS](https://arxiv.org/pdf/2105.06337.pdf), which is based on the Transformer architecture.
|
36 |
In the latter, by replacing 2D CNNs by 1D CNNs, a large reduction in memory consumption and fast synthesis is achieved.
|