Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,40 @@
|
|
1 |
---
|
|
|
|
|
2 |
license: cc-by-nc-4.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language:
|
3 |
+
- de
|
4 |
license: cc-by-nc-4.0
|
5 |
+
tags:
|
6 |
+
- speech
|
7 |
+
- text-to-speech
|
8 |
+
- F5-TTS
|
9 |
+
datasets:
|
10 |
+
- amphion/Emilia-Dataset
|
11 |
+
- fsicoli/common_voice_19_0
|
12 |
+
library_name: f5_tts
|
13 |
+
base_model:
|
14 |
+
- SWivid/F5-TTS
|
15 |
---
|
16 |
+
|
17 |
+
# German Voice Cloning TTS Model using F5-TTS Architecture
|
18 |
+
|
19 |
+
A German Text-to-Speech system capable of cloning voices from a few seconds of reference audio, built on the F5-TTS architecture.
|
20 |
+
|
21 |
+
## Model Details
|
22 |
+
- **Developed by:** Johanna Reiml and team at KI-Servicezentrum, Hasso-Plattner-Institut (HPI)
|
23 |
+
- **Base Model:** [SWivid/F5-TTS](https://huggingface.co/SWivid/F5-TTS)
|
24 |
+
- **Paper:** [F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching](https://arxiv.org/abs/2410.06885)
|
25 |
+
|
26 |
+
## Key Features & Capabilities
|
27 |
+
- Generates natural-sounding German speech from text
|
28 |
+
- Clones voices using minimal reference audio (few seconds)
|
29 |
+
- Suitable for audiobooks, voice assistants, and accessibility applications
|
30 |
+
|
31 |
+
## Technical Specifications
|
32 |
+
Download checkpoints from the directories F5TTS_Base (vocos) or F5TTS_Base_bigvgan (bigvgan).
|
33 |
+
- **Datasets:** Common Voice (Mozilla) and Emilia_DE
|
34 |
+
- **Process:** Fine-tuned checkpoints of [base F5-TTS model](https://huggingface.co/SWivid/F5-TTS)
|
35 |
+
- **Trained on Hardware:** 8x NVIDIA H100
|
36 |
+
|
37 |
+
## Contact
|
38 |
+
- AI Service Center: kisz@hpi.de
|
39 |
+
- Johanna Reiml: johanna@reiml.dev
|
40 |
+
- Enes Suermeli: muhammed.suermeli@student.hpi.uni-potsdam.de
|