|
---
|
|
language:
|
|
- de
|
|
license: cc-by-nc-4.0
|
|
tags:
|
|
- speech
|
|
- text-to-speech
|
|
- F5-TTS
|
|
datasets:
|
|
- amphion/Emilia-Dataset
|
|
- fsicoli/common_voice_19_0
|
|
library_name: f5_tts
|
|
base_model:
|
|
- SWivid/F5-TTS
|
|
---
|
|
|
|
# German Voice Cloning TTS Model using F5-TTS Architecture |
|
|
|
A German Text-to-Speech system capable of cloning voices from a few seconds of reference audio, built on the F5-TTS architecture. |
|
|
|
## Model Details |
|
- **Developed by:** Johanna Reiml and team at KI-Servicezentrum, Hasso-Plattner-Institut (HPI) |
|
- **Base Model:** [SWivid/F5-TTS](https://huggingface.co/SWivid/F5-TTS) |
|
- **Paper:** [F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching](https://arxiv.org/abs/2410.06885) |
|
|
|
## Key Features & Capabilities |
|
- Generates natural-sounding German speech from text |
|
- Clones voices using minimal reference audio (few seconds) |
|
- Suitable for audiobooks, voice assistants, and accessibility applications |
|
|
|
## Technical Specifications |
|
Download checkpoints from the directories F5TTS_Base (vocos) or F5TTS_Base_bigvgan (bigvgan). |
|
- **Datasets:** Common Voice (Mozilla) and Emilia_DE |
|
- **Process:** Fine-tuned checkpoints of [base F5-TTS model](https://huggingface.co/SWivid/F5-TTS) |
|
- **Trained on Hardware:** 8x NVIDIA H100 |
|
|
|
## Contact |
|
- AI Service Center: kisz@hpi.de |
|
- Johanna Reiml: johanna@reiml.dev |
|
- Enes Suermeli: muhammed.suermeli@student.hpi.uni-potsdam.de |