cmagui commited on
Commit
7b52fcd
1 Parent(s): 1e668a5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +109 -17
README.md CHANGED
@@ -14,53 +14,145 @@ tags:
14
 
15
  ---
16
 
17
- # Celtia: Nos Project's Galician TTS Model
18
  ## Model description
19
 
20
- This model was trained from scratch using the [Coqui TTS](https://github.com/coqui-ai/TTS) Python library on the corpus [Nos_Celtia-GL](https://zenodo.org/record/7716958).
 
 
 
 
 
 
21
 
22
- A live inference demo can be found in our official page, [here](https://tts.nos.gal/).
23
 
24
- This model was trained using graphemes. A preprocessing with the [Cotovía](http://gtm.uvigo.es/en/transfer/software/cotovia/) tool is needed for the input text.
25
 
26
  ## Intended uses and limitations
27
 
28
  You can use this model to generate synthetic speech in Galician.
29
 
30
- ## How to use
31
- ### Usage
32
 
33
- #### Cotovía preprocessor
34
 
35
- To generate fonectic transcriptions, the Cotovía tool is needed. The tool can be downloaded from the [SourceForge](https://sourceforge.net/projects/cotovia/files/Debian%20packages/) website. The required debian packages are `cotovia_0.5_amd64.deb` and `cotovia-lang-gl_0.5_all.deb`, that can be installed with the following commands:
36
 
37
  ```bash
38
  sudo dpkg -i cotovia_0.5_amd64.deb
39
  sudo dpkg -i cotovia-lang-gl_0.5_all.deb
40
  ```
 
41
 
42
- The tool can be used to generate the phonetic transcription of the text. The following command can be used to generate the phonetic transcription of a text string:
43
 
44
  ```bash
45
- echo "Era unha avioneta... O piloto era pequeno, que se chega a ser dos grandes, tómbate!" | cotovia -p -n -S | iconv -f iso88591 -t utf8
46
  ```
47
 
48
- The output of the command is the phonetic transcription of the input text. This string may be used in the inference part, as shown next.
 
 
49
 
50
- Required libraries:
51
 
52
  ```bash
53
- pip install TTS
54
  ```
55
 
56
- Synthesize speech using python and the script preprocess.py, avaliable in this repository:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
- ```bash
59
- python preprocess.py text model_path config_path
60
  ```
61
 
62
- This script takes a text input, preprocesses it with the cotovia tool, synthesizes speech from the preprocessed text, and saves the output as a .wav file.
 
 
63
 
 
 
 
64
 
65
  ## Training
66
 
 
14
 
15
  ---
16
 
17
+ # Celtia: Nós Project's Galician TTS Model
18
  ## Model description
19
 
20
+ **Celtia** is a Galician TTS model created under the [Nós project](https://nos.gal/gl/proxecto-nos). It was trained from scratch using the [Coqui TTS](https://github.com/coqui-ai/TTS) Python library on the corpus [Nos_Celtia-GL](https://zenodo.org/record/7716958). This corpus comprises a total of 20,000 sentences recorded by a professional voice talent. Specifically, a subset of 13,000 sentences, corresponding to 15.5 hours of speech, was used to train the model.
21
+
22
+ The model was trained directly on grapheme inputs, so no phonetic transcription is required. The [Cotovía](http://gtm.uvigo.es/en/transfer/software/cotovia/) tool can be used to normalize the input text.
23
+
24
+ You can test the model in our live inference demo ([Nós-TTS](https://tts.nos.gal/)) or in our spaces ([Galician TTS](https://huggingface.co/spaces/proxectonos/Nos_TTS_galician)).
25
+
26
+ <!-- The model can be tested using our online demo, [Nós-TTS](https://tts.nos.gal/), or in our spaces, [Galician TTS](https://huggingface.co/spaces/proxectonos/Nos_TTS_galician).-->
27
 
 
28
 
 
29
 
30
  ## Intended uses and limitations
31
 
32
  You can use this model to generate synthetic speech in Galician.
33
 
34
+ ## Installation
 
35
 
36
+ ### Cotovía
37
 
38
+ For text normalization, you can use the front-end of Cotovía. This software is available for download on the [SourceForge](https://sourceforge.net/projects/cotovia/files/Debian%20packages/) website. The required Debian packages are `cotovia_0.5_amd64.deb` and `cotovia-lang-gl_0.5_all.deb`, which can be installed using the following commands:
39
 
40
  ```bash
41
  sudo dpkg -i cotovia_0.5_amd64.deb
42
  sudo dpkg -i cotovia-lang-gl_0.5_all.deb
43
  ```
44
+ ### TTS library
45
 
46
+ To synthesize speech, you need to install the Coqui TTS library:
47
 
48
  ```bash
49
+ pip install TTS
50
  ```
51
 
52
+ ## How to use
53
+
54
+ ### Command-line usage
55
 
56
+ The following command normalizes and synthesizes the input text using the Celtia model:
57
 
58
  ```bash
59
+ echo "Son Celtia, unha voz creada con intelixencia artificial" | cotovia -p -n -S | iconv -f iso88591 -t utf8 | tts --text "$(cat -)" --model_path celtia.pth --config_path celtia_config.json --out_path celtia.wav
60
  ```
61
 
62
+ The output synthesized speech is saved to the specified audio file.
63
+
64
+
65
+ ### Python usage
66
+
67
+ Normalization and synthesis can also be performed within Python:
68
+
69
+ ```python
70
+ import argparse
71
+ import string
72
+ import subprocess
73
+ from TTS.utils.synthesizer import Synthesizer
74
+
75
+ def sanitize_filename(filename):
76
+ """Remove or replace any characters that are not allowed in file names."""
77
+ return ''.join(c for c in filename if c.isalnum() or c in (' ', '_', '-')).rstrip()
78
+
79
+ def to_cotovia(text):
80
+ # Input and output Cotovía files
81
+ COTOVIA_IN_TXT_PATH = res + '.txt'
82
+ COTOVIA_IN_TXT_PATH_ISO = 'iso8859-1' + res + '.txt'
83
+ COTOVIA_OUT_PRE_PATH = 'iso8859-1' + res + '.pre'
84
+ COTOVIA_OUT_PRE_PATH_UTF8 = 'utf8' + res + '.pre'
85
+
86
+ with open(COTOVIA_IN_TXT_PATH, 'w') as f:
87
+ f.write(text + '\n')
88
+
89
+ # utf-8 to iso8859-1
90
+ subprocess.run(["iconv", "-f", "utf-8", "-t", "iso8859-1", COTOVIA_IN_TXT_PATH, "-o", COTOVIA_IN_TXT_PATH_ISO], stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
91
+ subprocess.run(["cotovia", "-i", COTOVIA_IN_TXT_PATH_ISO, "-p"], stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
92
+ subprocess.run(["iconv", "-f", "iso8859-1", "-t", "utf-8", COTOVIA_OUT_PRE_PATH, "-o", COTOVIA_OUT_PRE_PATH_UTF8], stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
93
+
94
+ segs = []
95
+ try:
96
+ with open(COTOVIA_OUT_PRE_PATH_UTF8, 'r') as f:
97
+ segs = [line.rstrip() for line in f]
98
+ except:
99
+ print("ERROR: Couldn't read cotovia output")
100
+
101
+ subprocess.run(["rm", COTOVIA_IN_TXT_PATH, COTOVIA_IN_TXT_PATH_ISO, COTOVIA_OUT_PRE_PATH, COTOVIA_OUT_PRE_PATH_UTF8], stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
102
+
103
+ return segs
104
+
105
+ def text_preprocess(text):
106
+ cotovia_preproc_text = to_cotovia(text)
107
+
108
+ # convert list to string
109
+ cotovia_preproc_text_res = ' '.join(cotovia_preproc_text)
110
+
111
+ # add final punctuation if missing
112
+ if cotovia_preproc_text_res[-1] not in string.punctuation:
113
+ cotovia_preproc_text_res += '.'
114
+
115
+ return cotovia_preproc_text_res
116
+
117
+ def main():
118
+ parser = argparse.ArgumentParser(description='Cotovía text normalisation')
119
+ parser.add_argument('text', type=str, help='Text to synthetize')
120
+ parser.add_argument('model_path', type=str, help='Absolute path to the model checkpoint.pth')
121
+ parser.add_argument('config_path', type=str, help='Absolute path to the model config.json')
122
+
123
+ args = parser.parse_args()
124
+
125
+ print("Text before preprocessing: ", args.text)
126
+ text = text_preprocess(args.text)
127
+ print("Text after preprocessing: ", text)
128
+
129
+ synthesizer = Synthesizer(
130
+ args.model_path, args.config_path, None, None, None, None,
131
+ )
132
+
133
+ # Step 1: Extract the first word from the text
134
+ first_word = args.text.split()[0] if args.text.split() else "audio"
135
+ first_word = sanitize_filename(first_word) # Sanitize to make it a valid filename
136
+
137
+ # Step 2: Use synthesizer's built-in function to synthesize and save the audio
138
+ wavs = synthesizer.tts(text)
139
+ filename = f"{first_word}.wav"
140
+ synthesizer.save_wav(wavs, filename)
141
+
142
+ print(f"Audio file saved as: {filename}")
143
+
144
+ if __name__ == "__main__":
145
+ main()
146
 
 
 
147
  ```
148
 
149
+ This Python code takes an input text, normalizes it using Cotovía’s front-end, synthesizes speech from the normalized text, and saves the synthetic output speech as a .wav file.
150
+
151
+ A more advanced version, including additional text preprocessing, can be found in the script `synthesize.py`, avaliable in this repository. You can use this script to synthesise speech from an input text as follows:
152
 
153
+ ```bash
154
+ python synthesize.py text model_path config_path
155
+ ```
156
 
157
  ## Training
158