Text-to-Speech
English

Adding defined period pauses to the input text file

#61
by vijay120 - opened

Is there a way to add pauses to the text file to that TTS will pause for "x" number of seconds before continuing with the next sentence? Currently I need to create separate text files and merge the output TTS manually with a "x" second pause.

i second this. would be awesome:

  1. to have the ability to add pauses in a format like [BREAK=3] seconds or something like that
  2. control delay between sentences. now its too fast
  3. add other tags for filler words, etc

I had same issue and my approach was to handle it manually. This approach is available using both torch tensors or numpy arrays, but I think the syntaxis may change a little bit:

  1. I generate the speech segment. Kokoro returns them as numpy arrays but I convert them into torch tensors. It isn't necessary to do this conversion.
  2. I manually create a silence. As Kokoro's audios sample rate is 24000 Hz, to generate a silence of 3 seconds, it could be done as
    silence = torch.zeros(1, 3*24000). Using numpy arrays is the same but np.zeros instead of torch.zeros.
  3. Then I continue generating all the different segments that I need and all this segments, both speech and silence, are appended to a python list.
  4. When I finish with the generation I concatenate all segments with torch.cat() or np.concatenate().

However, it would be very amazing to have like a list of special tokens to perform this kind of things with the model itself. Not only pauses but also laughts, emotion, etc.

Anyways, I hope this is useful to perform the task you are commenting :)

Sign up or log in to comment