aihpi
/

F5-TTS-German

Model card Files Files and versions Community

F5-TTS-German / README.md

aihpi's picture

Update README.md

bc85421 verified 11 days ago

|

history blame contribute delete

1.69 kB

	---
	language:
	- de
	license: cc-by-nc-4.0
	tags:
	- speech
	- text-to-speech
	- F5-TTS
	datasets:
	- amphion/Emilia-Dataset
	- fsicoli/common_voice_19_0
	library_name: f5_tts
	base_model:
	- SWivid/F5-TTS
	---

	# German Voice Cloning TTS Model using F5-TTS Architecture

	A German Text-to-Speech system capable of cloning voices from a few seconds of reference audio, built on the F5-TTS architecture.

	## Model Details
	- Developed by: Johanna Reiml and team at KI-Servicezentrum, Hasso-Plattner-Institut (HPI)
	- Base Model: [SWivid/F5-TTS](https://huggingface.co/SWivid/F5-TTS)
	- Paper: [F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching](https://arxiv.org/abs/2410.06885)

	## Key Features & Capabilities
	- Generates natural-sounding German speech from text
	- Clones voices using minimal reference audio (few seconds)
	- Suitable for audiobooks, voice assistants, and accessibility applications

	## Technical Specifications
	Download checkpoints from the directories F5TTS_Base (vocos) or F5TTS_Base_bigvgan (bigvgan).
	- Datasets: Common Voice (Mozilla) and Emilia_DE
	- Process: Fine-tuned checkpoints of [base F5-TTS model](https://huggingface.co/SWivid/F5-TTS)
	- Trained on Hardware: 8x NVIDIA H100

	## Contact
	- AI Service Center: kisz@hpi.de
	- Johanna Reiml: johanna@reiml.dev
	- Enes Suermeli: muhammed.suermeli@student.hpi.uni-potsdam.de
	- Kajo Kratzenstein: kajo.kratzenstein@student.hpi.de
	- Carlos Menke: carlos.menke@rwth-aachen.de


	## Acknowledgements
	The authors acknowledge the financial support by the German Federal Ministry for Education and Research (BMBF) through the project «KI-Servicezentrum Berlin Brandenburg» (01IS22092).