Update README.md
Browse files
README.md
CHANGED
@@ -14,53 +14,145 @@ tags:
|
|
14 |
|
15 |
---
|
16 |
|
17 |
-
# Celtia:
|
18 |
## Model description
|
19 |
|
20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
-
A live inference demo can be found in our official page, [here](https://tts.nos.gal/).
|
23 |
|
24 |
-
This model was trained using graphemes. A preprocessing with the [Cotovía](http://gtm.uvigo.es/en/transfer/software/cotovia/) tool is needed for the input text.
|
25 |
|
26 |
## Intended uses and limitations
|
27 |
|
28 |
You can use this model to generate synthetic speech in Galician.
|
29 |
|
30 |
-
##
|
31 |
-
### Usage
|
32 |
|
33 |
-
|
34 |
|
35 |
-
|
36 |
|
37 |
```bash
|
38 |
sudo dpkg -i cotovia_0.5_amd64.deb
|
39 |
sudo dpkg -i cotovia-lang-gl_0.5_all.deb
|
40 |
```
|
|
|
41 |
|
42 |
-
|
43 |
|
44 |
```bash
|
45 |
-
|
46 |
```
|
47 |
|
48 |
-
|
|
|
|
|
49 |
|
50 |
-
|
51 |
|
52 |
```bash
|
53 |
-
|
54 |
```
|
55 |
|
56 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
|
58 |
-
```bash
|
59 |
-
python preprocess.py text model_path config_path
|
60 |
```
|
61 |
|
62 |
-
This
|
|
|
|
|
63 |
|
|
|
|
|
|
|
64 |
|
65 |
## Training
|
66 |
|
|
|
14 |
|
15 |
---
|
16 |
|
17 |
+
# Celtia: Nós Project's Galician TTS Model
|
18 |
## Model description
|
19 |
|
20 |
+
**Celtia** is a Galician TTS model created under the [Nós project](https://nos.gal/gl/proxecto-nos). It was trained from scratch using the [Coqui TTS](https://github.com/coqui-ai/TTS) Python library on the corpus [Nos_Celtia-GL](https://zenodo.org/record/7716958). This corpus comprises a total of 20,000 sentences recorded by a professional voice talent. Specifically, a subset of 13,000 sentences, corresponding to 15.5 hours of speech, was used to train the model.
|
21 |
+
|
22 |
+
The model was trained directly on grapheme inputs, so no phonetic transcription is required. The [Cotovía](http://gtm.uvigo.es/en/transfer/software/cotovia/) tool can be used to normalize the input text.
|
23 |
+
|
24 |
+
You can test the model in our live inference demo ([Nós-TTS](https://tts.nos.gal/)) or in our spaces ([Galician TTS](https://huggingface.co/spaces/proxectonos/Nos_TTS_galician)).
|
25 |
+
|
26 |
+
<!-- The model can be tested using our online demo, [Nós-TTS](https://tts.nos.gal/), or in our spaces, [Galician TTS](https://huggingface.co/spaces/proxectonos/Nos_TTS_galician).-->
|
27 |
|
|
|
28 |
|
|
|
29 |
|
30 |
## Intended uses and limitations
|
31 |
|
32 |
You can use this model to generate synthetic speech in Galician.
|
33 |
|
34 |
+
## Installation
|
|
|
35 |
|
36 |
+
### Cotovía
|
37 |
|
38 |
+
For text normalization, you can use the front-end of Cotovía. This software is available for download on the [SourceForge](https://sourceforge.net/projects/cotovia/files/Debian%20packages/) website. The required Debian packages are `cotovia_0.5_amd64.deb` and `cotovia-lang-gl_0.5_all.deb`, which can be installed using the following commands:
|
39 |
|
40 |
```bash
|
41 |
sudo dpkg -i cotovia_0.5_amd64.deb
|
42 |
sudo dpkg -i cotovia-lang-gl_0.5_all.deb
|
43 |
```
|
44 |
+
### TTS library
|
45 |
|
46 |
+
To synthesize speech, you need to install the Coqui TTS library:
|
47 |
|
48 |
```bash
|
49 |
+
pip install TTS
|
50 |
```
|
51 |
|
52 |
+
## How to use
|
53 |
+
|
54 |
+
### Command-line usage
|
55 |
|
56 |
+
The following command normalizes and synthesizes the input text using the Celtia model:
|
57 |
|
58 |
```bash
|
59 |
+
echo "Son Celtia, unha voz creada con intelixencia artificial" | cotovia -p -n -S | iconv -f iso88591 -t utf8 | tts --text "$(cat -)" --model_path celtia.pth --config_path celtia_config.json --out_path celtia.wav
|
60 |
```
|
61 |
|
62 |
+
The output synthesized speech is saved to the specified audio file.
|
63 |
+
|
64 |
+
|
65 |
+
### Python usage
|
66 |
+
|
67 |
+
Normalization and synthesis can also be performed within Python:
|
68 |
+
|
69 |
+
```python
|
70 |
+
import argparse
|
71 |
+
import string
|
72 |
+
import subprocess
|
73 |
+
from TTS.utils.synthesizer import Synthesizer
|
74 |
+
|
75 |
+
def sanitize_filename(filename):
|
76 |
+
"""Remove or replace any characters that are not allowed in file names."""
|
77 |
+
return ''.join(c for c in filename if c.isalnum() or c in (' ', '_', '-')).rstrip()
|
78 |
+
|
79 |
+
def to_cotovia(text):
|
80 |
+
# Input and output Cotovía files
|
81 |
+
COTOVIA_IN_TXT_PATH = res + '.txt'
|
82 |
+
COTOVIA_IN_TXT_PATH_ISO = 'iso8859-1' + res + '.txt'
|
83 |
+
COTOVIA_OUT_PRE_PATH = 'iso8859-1' + res + '.pre'
|
84 |
+
COTOVIA_OUT_PRE_PATH_UTF8 = 'utf8' + res + '.pre'
|
85 |
+
|
86 |
+
with open(COTOVIA_IN_TXT_PATH, 'w') as f:
|
87 |
+
f.write(text + '\n')
|
88 |
+
|
89 |
+
# utf-8 to iso8859-1
|
90 |
+
subprocess.run(["iconv", "-f", "utf-8", "-t", "iso8859-1", COTOVIA_IN_TXT_PATH, "-o", COTOVIA_IN_TXT_PATH_ISO], stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
|
91 |
+
subprocess.run(["cotovia", "-i", COTOVIA_IN_TXT_PATH_ISO, "-p"], stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
|
92 |
+
subprocess.run(["iconv", "-f", "iso8859-1", "-t", "utf-8", COTOVIA_OUT_PRE_PATH, "-o", COTOVIA_OUT_PRE_PATH_UTF8], stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
|
93 |
+
|
94 |
+
segs = []
|
95 |
+
try:
|
96 |
+
with open(COTOVIA_OUT_PRE_PATH_UTF8, 'r') as f:
|
97 |
+
segs = [line.rstrip() for line in f]
|
98 |
+
except:
|
99 |
+
print("ERROR: Couldn't read cotovia output")
|
100 |
+
|
101 |
+
subprocess.run(["rm", COTOVIA_IN_TXT_PATH, COTOVIA_IN_TXT_PATH_ISO, COTOVIA_OUT_PRE_PATH, COTOVIA_OUT_PRE_PATH_UTF8], stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
|
102 |
+
|
103 |
+
return segs
|
104 |
+
|
105 |
+
def text_preprocess(text):
|
106 |
+
cotovia_preproc_text = to_cotovia(text)
|
107 |
+
|
108 |
+
# convert list to string
|
109 |
+
cotovia_preproc_text_res = ' '.join(cotovia_preproc_text)
|
110 |
+
|
111 |
+
# add final punctuation if missing
|
112 |
+
if cotovia_preproc_text_res[-1] not in string.punctuation:
|
113 |
+
cotovia_preproc_text_res += '.'
|
114 |
+
|
115 |
+
return cotovia_preproc_text_res
|
116 |
+
|
117 |
+
def main():
|
118 |
+
parser = argparse.ArgumentParser(description='Cotovía text normalisation')
|
119 |
+
parser.add_argument('text', type=str, help='Text to synthetize')
|
120 |
+
parser.add_argument('model_path', type=str, help='Absolute path to the model checkpoint.pth')
|
121 |
+
parser.add_argument('config_path', type=str, help='Absolute path to the model config.json')
|
122 |
+
|
123 |
+
args = parser.parse_args()
|
124 |
+
|
125 |
+
print("Text before preprocessing: ", args.text)
|
126 |
+
text = text_preprocess(args.text)
|
127 |
+
print("Text after preprocessing: ", text)
|
128 |
+
|
129 |
+
synthesizer = Synthesizer(
|
130 |
+
args.model_path, args.config_path, None, None, None, None,
|
131 |
+
)
|
132 |
+
|
133 |
+
# Step 1: Extract the first word from the text
|
134 |
+
first_word = args.text.split()[0] if args.text.split() else "audio"
|
135 |
+
first_word = sanitize_filename(first_word) # Sanitize to make it a valid filename
|
136 |
+
|
137 |
+
# Step 2: Use synthesizer's built-in function to synthesize and save the audio
|
138 |
+
wavs = synthesizer.tts(text)
|
139 |
+
filename = f"{first_word}.wav"
|
140 |
+
synthesizer.save_wav(wavs, filename)
|
141 |
+
|
142 |
+
print(f"Audio file saved as: {filename}")
|
143 |
+
|
144 |
+
if __name__ == "__main__":
|
145 |
+
main()
|
146 |
|
|
|
|
|
147 |
```
|
148 |
|
149 |
+
This Python code takes an input text, normalizes it using Cotovía’s front-end, synthesizes speech from the normalized text, and saves the synthetic output speech as a .wav file.
|
150 |
+
|
151 |
+
A more advanced version, including additional text preprocessing, can be found in the script `synthesize.py`, avaliable in this repository. You can use this script to synthesise speech from an input text as follows:
|
152 |
|
153 |
+
```bash
|
154 |
+
python synthesize.py text model_path config_path
|
155 |
+
```
|
156 |
|
157 |
## Training
|
158 |
|