JackismyShephard/danish-speech-synthesis · Apply for community grant: Personal project (gpu)

Given that danish is a low-resource language, not many open-source implementations of a danish text-to-speech synthesizer are available online. As of writing, the only other existing implementations available on 🤗 are facebook/seamless-streaming and audo/seamless-m4t-v2-large.

My personal project is developing a simpler alternative that still performs reasonable well, both in terms of output quality and inference time. I believe this project to be of great value to the community, especially since the aforementioned models do not have an associated space like mine on 🤗, which provides an easy interface for danish text-to-speech synthesis, as well as optional speech enhancement.

Unfortunately, this space is currently not very user-friendly, as it is running on CPU. Concretely, translating text containing multiple sentences takes more than a minute, and if speech enhancement is turned on, the full runtime is close to 5 minutes. It would therefore be of great benefit, if you could provide GPU resources for this space.

For reference, the model itself can be found at JackismyShephard/speecht5_tts-finetuned-nst-da. Additionally, below I have attached an example of speech generated by this space.