File size: 1,395 Bytes
446b055 b33bd85 446b055 b33bd85 446b055 b33bd85 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
---
language:
- de
license: cc-by-nc-4.0
tags:
- speech
- text-to-speech
- F5-TTS
datasets:
- amphion/Emilia-Dataset
- fsicoli/common_voice_19_0
library_name: f5_tts
base_model:
- SWivid/F5-TTS
---
# German Voice Cloning TTS Model using F5-TTS Architecture
A German Text-to-Speech system capable of cloning voices from a few seconds of reference audio, built on the F5-TTS architecture.
## Model Details
- **Developed by:** Johanna Reiml and team at KI-Servicezentrum, Hasso-Plattner-Institut (HPI)
- **Base Model:** [SWivid/F5-TTS](https://huggingface.co/SWivid/F5-TTS)
- **Paper:** [F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching](https://arxiv.org/abs/2410.06885)
## Key Features & Capabilities
- Generates natural-sounding German speech from text
- Clones voices using minimal reference audio (few seconds)
- Suitable for audiobooks, voice assistants, and accessibility applications
## Technical Specifications
Download checkpoints from the directories F5TTS_Base (vocos) or F5TTS_Base_bigvgan (bigvgan).
- **Datasets:** Common Voice (Mozilla) and Emilia_DE
- **Process:** Fine-tuned checkpoints of [base F5-TTS model](https://huggingface.co/SWivid/F5-TTS)
- **Trained on Hardware:** 8x NVIDIA H100
## Contact
- AI Service Center: kisz@hpi.de
- Johanna Reiml: johanna@reiml.dev
- Enes Suermeli: muhammed.suermeli@student.hpi.uni-potsdam.de |