hexgrad/Kokoro-82M · Adding defined period pauses to the input text file

14 days ago

Is there a way to add pauses to the text file to that TTS will pause for "x" number of seconds before continuing with the next sentence? Currently I need to create separate text files and merge the output TTS manually with a "x" second pause.

alpacaxz

13 days ago

i second this. would be awesome:

to have the ability to add pauses in a format like [BREAK=3] seconds or something like that
control delay between sentences. now its too fast
add other tags for filler words, etc

cgarijo-Ci21

12 days ago

I had same issue and my approach was to handle it manually. This approach is available using both torch tensors or numpy arrays, but I think the syntaxis may change a little bit:

I generate the speech segment. Kokoro returns them as numpy arrays but I convert them into torch tensors. It isn't necessary to do this conversion.
I manually create a silence. As Kokoro's audios sample rate is 24000 Hz, to generate a silence of 3 seconds, it could be done as
silence = torch.zeros(1, 3*24000). Using numpy arrays is the same but np.zeros instead of torch.zeros.
Then I continue generating all the different segments that I need and all this segments, both speech and silence, are appended to a python list.
When I finish with the generation I concatenate all segments with torch.cat() or np.concatenate().

However, it would be very amazing to have like a list of special tokens to perform this kind of things with the model itself. Not only pauses but also laughts, emotion, etc.

Anyways, I hope this is useful to perform the task you are commenting :)