Text-to-Speech
Fairseq
English
audio

adding pauses and dealing with numbers

#14
by dekislev - opened

just wanted to share what worked for me

i noticed the model has a bit of an issue dealing with numbers and punctuations. but it deals quite well with a ','
so processed my text with:

text = text.replace(".", ",").replace("!", ",").replace("?", ",").replace(":", ",").replace(";", ",")
text= text.replace("(",',').replace(")",',').replace("[",',').replace("]",',').replace("{",',').replace("}",',')
text= text.replace('"',',').replace("β€œ",',').replace("”",',')
text= text.replace("-",' ').replace("_",' ').replace("β€”",' ').replace("–",' ').replace("…",' ')

in addition i saw it has a bit of a problem pronouncing numbers like years.. so even before the replacing i processed it with

from num2words import num2words
import re

def convert_numbers_to_text(text):
    # Regular expression pattern to match numbers
    pattern = r'\b\d+\b'
    
    def replace(match):
        number = int(match.group())
        return num2words(number)
    
    # Replace numbers in the text with their textual representation
    converted_text = re.sub(pattern, replace, text)

    return converted_text

text= convert_numbers_to_text(text)

hope it helps you too

That did the job, thanks a lot!

I actually ended up splitting the paragraphs by the "dot"s in there and feed them to the model separately which showed a better result

Sign up or log in to comment