German Voice Cloning TTS Model using F5-TTS Architecture
A German Text-to-Speech system capable of cloning voices from a few seconds of reference audio, built on the F5-TTS architecture.
Model Details
- Developed by: Johanna Reiml and team at KI-Servicezentrum, Hasso-Plattner-Institut (HPI)
- Base Model: SWivid/F5-TTS
- Paper: F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Key Features & Capabilities
- Generates natural-sounding German speech from text
- Clones voices using minimal reference audio (few seconds)
- Suitable for audiobooks, voice assistants, and accessibility applications
Technical Specifications
Download checkpoints from the directories F5TTS_Base (vocos) or F5TTS_Base_bigvgan (bigvgan).
- Datasets: Common Voice (Mozilla) and Emilia_DE
- Process: Fine-tuned checkpoints of base F5-TTS model
- Trained on Hardware: 8x NVIDIA H100
Contact
- AI Service Center: kisz@hpi.de
- Johanna Reiml: johanna@reiml.dev
- Enes Suermeli: muhammed.suermeli@student.hpi.uni-potsdam.de
- Kajo Kratzenstein: kajo.kratzenstein@student.hpi.de
- Carlos Menke: carlos.menke@rwth-aachen.de
Acknowledgements
The authors acknowledge the financial support by the German Federal Ministry for Education and Research (BMBF) through the project «KI-Servicezentrum Berlin Brandenburg» (01IS22092).
Inference API (serverless) does not yet support f5_tts models for this pipeline type.
Model tree for aihpi/F5-TTS-German
Base model
SWivid/F5-TTS