E2-F5-TTS

Running

mrfakename commited on 13 days ago

Commit

811c6c1

•

1 Parent(s): 7804f9c

Sync from GitHub repo

This Space is synced from the GitHub repo: https://github.com/SWivid/F5-TTS. Please submit contributions to the Space there

Files changed (3) hide show

README_REPO.md CHANGED Viewed

@@ -72,6 +72,7 @@ Currently supported features:
 - Basic TTS with Chunk Inference
 - Multi-Style / Multi-Speaker Generation
 - Voice Chat powered by Qwen2.5-3B-Instruct
 ```bash
 # Launch a Gradio app (web interface)

 - Basic TTS with Chunk Inference
 - Multi-Style / Multi-Speaker Generation
 - Voice Chat powered by Qwen2.5-3B-Instruct
+- [Custom model](src/f5_tts/infer/SHARED.md) inference (local only)
 ```bash
 # Launch a Gradio app (web interface)

src/f5_tts/infer/README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 The pretrained model checkpoints can be reached at [🤗 Hugging Face](https://huggingface.co/SWivid/F5-TTS) and [🤖 Model Scope](https://www.modelscope.cn/models/SWivid/F5-TTS_Emilia-ZH-EN), or will be automatically downloaded when running inference scripts.
-More checkpoints with whole community efforts can be found [here](src/f5_tts/infer/SHARED.md), supporting more languages.
 Currently support **30s for a single** generation, which is the **total length** including both prompt and output audio. However, you can provide `infer_cli` and `infer_gradio` with longer text, will automatically do chunk generation. Long reference audio will be **clip short to ~15s**.

 The pretrained model checkpoints can be reached at [🤗 Hugging Face](https://huggingface.co/SWivid/F5-TTS) and [🤖 Model Scope](https://www.modelscope.cn/models/SWivid/F5-TTS_Emilia-ZH-EN), or will be automatically downloaded when running inference scripts.
+**More checkpoints with whole community efforts can be found in [SHARED.md](SHARED.md), supporting more languages.**
 Currently support **30s for a single** generation, which is the **total length** including both prompt and output audio. However, you can provide `infer_cli` and `infer_gradio` with longer text, will automatically do chunk generation. Long reference audio will be **clip short to ~15s**.

src/f5_tts/infer/SHARED.md CHANGED Viewed

@@ -4,6 +4,7 @@
 - This document is serving as a quick lookup table for the community training/finetuning result, with various language support.
 - The models in this repository are open source and are based on voluntary contributions from contributors.
 - The use of models must be conditioned on respect for the respective creators. The convenience brought comes from their efforts.
 <!-- omit in toc -->
@@ -25,7 +26,7 @@
 MODEL_CKPT: hf://SWivid/F5-TTS/F5TTS_Base/model_1200000.safetensors
 VOCAB_FILE: hf://SWivid/F5-TTS/F5TTS_Base/vocab.txt
 ```
-***Other infos, e.g. Github Repo, Usage Instruction, Tutorial (Blog, Video, etc.) ...***
 ### Mandarin

 - This document is serving as a quick lookup table for the community training/finetuning result, with various language support.
 - The models in this repository are open source and are based on voluntary contributions from contributors.
 - The use of models must be conditioned on respect for the respective creators. The convenience brought comes from their efforts.
+- Welcome to pull request sharing your result here.
 <!-- omit in toc -->
 MODEL_CKPT: hf://SWivid/F5-TTS/F5TTS_Base/model_1200000.safetensors
 VOCAB_FILE: hf://SWivid/F5-TTS/F5TTS_Base/vocab.txt
 ```
+*Other infos, e.g. Link to some sampled results, Github repo, Usage instruction, Tutorial (Blog, Video, etc.) ...*
 ### Mandarin