proxectonos
/

Nos_TTS-celtia-vits-graphemes

@@ -14,53 +14,145 @@ tags:
 ---
-# Celtia: Nos Project's Galician TTS Model
 ## Model description
-This model was trained from scratch using the [Coqui TTS](https://github.com/coqui-ai/TTS) Python library on the corpus [Nos_Celtia-GL](https://zenodo.org/record/7716958).
-A live inference demo can be found in our official page, [here](https://tts.nos.gal/).
-This model was trained using graphemes. A preprocessing with the [Cotovía](http://gtm.uvigo.es/en/transfer/software/cotovia/) tool is needed for the input text.
 ## Intended uses and limitations
 You can use this model to generate synthetic speech in Galician.
-## How to use
-### Usage
-#### Cotovía preprocessor
-To generate fonectic transcriptions, the Cotovía tool is needed. The tool can be downloaded from the [SourceForge](https://sourceforge.net/projects/cotovia/files/Debian%20packages/) website. The required debian packages are `cotovia_0.5_amd64.deb` and `cotovia-lang-gl_0.5_all.deb`, that can be installed with the following commands:
 ```bash
 sudo dpkg -i cotovia_0.5_amd64.deb
 sudo dpkg -i cotovia-lang-gl_0.5_all.deb
 ```
-The tool can be used to generate the phonetic transcription of the text. The following command can be used to generate the phonetic transcription of a text string:
 ```bash
-echo "Era unha avioneta... O piloto era pequeno, que se chega a ser dos grandes, tómbate!" | cotovia -p -n -S | iconv -f iso88591 -t utf8
 ```
-The output of the command is the phonetic transcription of the input text. This string may be used in the inference part, as shown next.
-Required libraries:
 ```bash
-pip install TTS
 ```
-Synthesize speech using python and the script preprocess.py, avaliable in this repository:
-```bash
-python preprocess.py text model_path config_path
 ```
-This script takes a text input, preprocesses it with the cotovia tool, synthesizes speech from the preprocessed text, and saves the output as a .wav file.
 ## Training

 ---
+# Celtia: Nós Project's Galician TTS Model
 ## Model description
+**Celtia** is a Galician TTS model created under the [Nós project](https://nos.gal/gl/proxecto-nos). It was trained from scratch using the [Coqui TTS](https://github.com/coqui-ai/TTS) Python library on the corpus [Nos_Celtia-GL](https://zenodo.org/record/7716958). This corpus comprises a total of 20,000 sentences recorded by a professional voice talent. Specifically, a subset of 13,000 sentences, corresponding to 15.5 hours of speech, was used to train the model.
+The model was trained directly on grapheme inputs, so no phonetic transcription is required. The [Cotovía](http://gtm.uvigo.es/en/transfer/software/cotovia/) tool can be used to normalize the input text.
+You can test the model in our live inference demo ([Nós-TTS](https://tts.nos.gal/)) or in our spaces ([Galician TTS](https://huggingface.co/spaces/proxectonos/Nos_TTS_galician)).
+<!-- The model can be tested using our online demo, [Nós-TTS](https://tts.nos.gal/), or in our spaces, [Galician TTS](https://huggingface.co/spaces/proxectonos/Nos_TTS_galician).-->
 ## Intended uses and limitations
 You can use this model to generate synthetic speech in Galician.
+## Installation
+### Cotovía
+For text normalization, you can use the front-end of Cotovía. This software is available for download on the [SourceForge](https://sourceforge.net/projects/cotovia/files/Debian%20packages/) website. The required Debian packages are `cotovia_0.5_amd64.deb` and `cotovia-lang-gl_0.5_all.deb`, which can be installed using the following commands:
 ```bash
 sudo dpkg -i cotovia_0.5_amd64.deb
 sudo dpkg -i cotovia-lang-gl_0.5_all.deb
 ```
+### TTS library
+To synthesize speech, you need to install the Coqui TTS library:
 ```bash
+pip install TTS
 ```
+## How to use
+### Command-line usage
+The following command normalizes and synthesizes the input text using the Celtia model:
 ```bash
+echo "Son Celtia, unha voz creada con intelixencia artificial" | cotovia -p -n -S | iconv -f iso88591 -t utf8 | tts --text "$(cat -)" --model_path celtia.pth --config_path celtia_config.json --out_path celtia.wav
 ```
+The output synthesized speech is saved to the specified audio file.
+### Python usage
+Normalization and synthesis can also be performed within Python:
+```python
+import argparse
+import string
+import subprocess
+from TTS.utils.synthesizer import Synthesizer
+def sanitize_filename(filename):
+    """Remove or replace any characters that are not allowed in file names."""
+    return ''.join(c for c in filename if c.isalnum() or c in (' ', '_', '-')).rstrip()
+def to_cotovia(text):
+    # Input and output Cotovía files
+    COTOVIA_IN_TXT_PATH = res + '.txt'
+    COTOVIA_IN_TXT_PATH_ISO = 'iso8859-1' + res + '.txt'
+    COTOVIA_OUT_PRE_PATH = 'iso8859-1' + res + '.pre'
+    COTOVIA_OUT_PRE_PATH_UTF8 = 'utf8' + res + '.pre'
+    with open(COTOVIA_IN_TXT_PATH, 'w') as f:
+        f.write(text + '\n')
+    # utf-8 to iso8859-1
+    subprocess.run(["iconv", "-f", "utf-8", "-t", "iso8859-1", COTOVIA_IN_TXT_PATH, "-o", COTOVIA_IN_TXT_PATH_ISO], stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
+    subprocess.run(["cotovia", "-i", COTOVIA_IN_TXT_PATH_ISO, "-p"], stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
+    subprocess.run(["iconv", "-f", "iso8859-1", "-t", "utf-8", COTOVIA_OUT_PRE_PATH, "-o", COTOVIA_OUT_PRE_PATH_UTF8], stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
+    segs = []
+    try:
+        with open(COTOVIA_OUT_PRE_PATH_UTF8, 'r') as f:
+            segs = [line.rstrip() for line in f]
+    except:
+        print("ERROR: Couldn't read cotovia output")
+    subprocess.run(["rm", COTOVIA_IN_TXT_PATH, COTOVIA_IN_TXT_PATH_ISO, COTOVIA_OUT_PRE_PATH, COTOVIA_OUT_PRE_PATH_UTF8], stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
+    return segs
+def text_preprocess(text):
+    cotovia_preproc_text = to_cotovia(text)
+    # convert list to string
+    cotovia_preproc_text_res = ' '.join(cotovia_preproc_text)
+    # add final punctuation if missing
+    if cotovia_preproc_text_res[-1] not in string.punctuation:
+        cotovia_preproc_text_res += '.'
+    return cotovia_preproc_text_res
+def main():
+    parser = argparse.ArgumentParser(description='Cotovía text normalisation')
+    parser.add_argument('text', type=str, help='Text to synthetize')
+    parser.add_argument('model_path', type=str, help='Absolute path to the model checkpoint.pth')
+    parser.add_argument('config_path', type=str, help='Absolute path to the model config.json')
+    args = parser.parse_args()
+    print("Text before preprocessing: ", args.text)
+    text = text_preprocess(args.text)
+    print("Text after preprocessing: ", text)
+    synthesizer = Synthesizer(
+        args.model_path, args.config_path, None, None, None, None,
+    )
+    # Step 1: Extract the first word from the text
+    first_word = args.text.split()[0] if args.text.split() else "audio"
+    first_word = sanitize_filename(first_word)  # Sanitize to make it a valid filename
+    # Step 2: Use synthesizer's built-in function to synthesize and save the audio
+    wavs = synthesizer.tts(text)
+    filename = f"{first_word}.wav"
+    synthesizer.save_wav(wavs, filename)
+    print(f"Audio file saved as: {filename}")
+if __name__ == "__main__":
+    main()
 ```
+This Python code takes an input text, normalizes it using Cotovía’s front-end, synthesizes speech from the normalized text, and saves the synthetic output speech as a .wav file.
+A more advanced version, including additional text preprocessing, can be found in the script `synthesize.py`, avaliable in this repository. You can use this script to synthesise speech from an input text as follows:
+```bash
+python synthesize.py text model_path config_path
+```
 ## Training