Text-to-Speech
German
f5_tts
speech
F5-TTS

German Voice Cloning TTS Model using F5-TTS Architecture

A German Text-to-Speech system capable of cloning voices from a few seconds of reference audio, built on the F5-TTS architecture.

Model Details

Key Features & Capabilities

  • Generates natural-sounding German speech from text
  • Clones voices using minimal reference audio (few seconds)
  • Suitable for audiobooks, voice assistants, and accessibility applications

Technical Specifications

Download checkpoints from the directories F5TTS_Base (vocos) or F5TTS_Base_bigvgan (bigvgan).

  • Datasets: Common Voice (Mozilla) and Emilia_DE
  • Process: Fine-tuned checkpoints of base F5-TTS model
  • Trained on Hardware: 8x NVIDIA H100

Contact

Acknowledgements

The authors acknowledge the financial support by the German Federal Ministry for Education and Research (BMBF) through the project «KI-Servicezentrum Berlin Brandenburg» (01IS22092).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
Inference API (serverless) does not yet support f5_tts models for this pipeline type.

Model tree for aihpi/F5-TTS-German

Base model

SWivid/F5-TTS
Finetuned
(18)
this model

Datasets used to train aihpi/F5-TTS-German