hexgrad/Kokoro-82M · Voice Embedding JSON Files

5 days ago

I have uploaded the voice embeddings in JSON format on the hub. You can check it out ecyht2/kokoro-82M-voices.

P.S. I got the script from (https://github.com/thewh1teagle/kokoro-onnx/blob/main/scripts/fetch_voices.py) and (https://github.com/lucasjinreal/Kokoros/blob/main/scripts/fetch_voices.py) seems to be simillar.

P.S.2. You can also use the voice embedding in the kokorojs repo.

shub1

1 day ago

•

edited 1 day ago

Hi ecyht2 awesome work! I was planning to do the same with voice embeddings aswell. There is potential for adding kokoro to a pretty popular extension called Read Aloud through the sub-extension https://github.com/ken107/piper-browser-extension. both our embeddings (I made kokorojs) seem to be still in tensor format where as the extension seems to use a different encoding scheme for there voices https://huggingface.co/rhasspy/piper-voices/blob/main/ar/ar_JO/kareem/low/ar_JO-kareem-low.onnx.json.

Im trying to port kokoro to this extension any help with understanding how this voice data can be encoded to look more like the one used in piper-voices would be great. Maybe my understanding of how these voice files work for this model is completly incorrect.

ecyht2

about 15 hours ago

Im trying to port kokoro to this extension any help with understanding how this voice data can be encoded to look more like the one used in piper-voices would be great. Maybe my understanding of how these voice files work for this model is completly incorrect.

I am not exactly sure how the voices in piper TTS works. From reading the code in their github it seems like the voice is embedded inside the ONNX model it self unlike Kokoro. The text is phonemized, and converted using the ID inside the .json file. The ID is used then inputed in the ONNX session of the loaded model.