Adding defined period pauses to the input text file
#61
by
vijay120
- opened
Is there a way to add pauses to the text file to that TTS will pause for "x" number of seconds before continuing with the next sentence? Currently I need to create separate text files and merge the output TTS manually with a "x" second pause.
i second this. would be awesome:
- to have the ability to add pauses in a format like [BREAK=3] seconds or something like that
- control delay between sentences. now its too fast
- add other tags for filler words, etc
I had same issue and my approach was to handle it manually. This approach is available using both torch tensors or numpy arrays, but I think the syntaxis may change a little bit:
- I generate the speech segment. Kokoro returns them as numpy arrays but I convert them into torch tensors. It isn't necessary to do this conversion.
- I manually create a silence. As Kokoro's audios sample rate is 24000 Hz, to generate a silence of 3 seconds, it could be done as
silence = torch.zeros(1, 3*24000)
. Using numpy arrays is the same but np.zeros instead of torch.zeros. - Then I continue generating all the different segments that I need and all this segments, both speech and silence, are appended to a python list.
- When I finish with the generation I concatenate all segments with torch.cat() or np.concatenate().
However, it would be very amazing to have like a list of special tokens to perform this kind of things with the model itself. Not only pauses but also laughts, emotion, etc.
Anyways, I hope this is useful to perform the task you are commenting :)