--- language: - de license: cc-by-nc-4.0 tags: - speech - text-to-speech - F5-TTS datasets: - amphion/Emilia-Dataset - fsicoli/common_voice_19_0 library_name: f5_tts base_model: - SWivid/F5-TTS --- # German Voice Cloning TTS Model using F5-TTS Architecture A German Text-to-Speech system capable of cloning voices from a few seconds of reference audio, built on the F5-TTS architecture. ## Model Details - **Developed by:** Johanna Reiml and team at KI-Servicezentrum, Hasso-Plattner-Institut (HPI) - **Base Model:** [SWivid/F5-TTS](https://huggingface.co/SWivid/F5-TTS) - **Paper:** [F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching](https://arxiv.org/abs/2410.06885) ## Key Features & Capabilities - Generates natural-sounding German speech from text - Clones voices using minimal reference audio (few seconds) - Suitable for audiobooks, voice assistants, and accessibility applications ## Technical Specifications Download checkpoints from the directories F5TTS_Base (vocos) or F5TTS_Base_bigvgan (bigvgan). - **Datasets:** Common Voice (Mozilla) and Emilia_DE - **Process:** Fine-tuned checkpoints of [base F5-TTS model](https://huggingface.co/SWivid/F5-TTS) - **Trained on Hardware:** 8x NVIDIA H100 ## Contact - AI Service Center: kisz@hpi.de - Johanna Reiml: johanna@reiml.dev - Enes Suermeli: muhammed.suermeli@student.hpi.uni-potsdam.de - Kajo Kratzenstein: kajo.kratzenstein@student.hpi.de - Carlos Menke: carlos.menke@rwth-aachen.de ## Acknowledgements The authors acknowledge the financial support by the German Federal Ministry for Education and Research (BMBF) through the project «KI-Servicezentrum Berlin Brandenburg» (01IS22092).