Spaces:
Running
Running
mrfakename
commited on
Commit
•
811c6c1
1
Parent(s):
7804f9c
Sync from GitHub repo
Browse filesThis Space is synced from the GitHub repo: https://github.com/SWivid/F5-TTS. Please submit contributions to the Space there
- README_REPO.md +1 -0
- src/f5_tts/infer/README.md +1 -1
- src/f5_tts/infer/SHARED.md +2 -1
README_REPO.md
CHANGED
@@ -72,6 +72,7 @@ Currently supported features:
|
|
72 |
- Basic TTS with Chunk Inference
|
73 |
- Multi-Style / Multi-Speaker Generation
|
74 |
- Voice Chat powered by Qwen2.5-3B-Instruct
|
|
|
75 |
|
76 |
```bash
|
77 |
# Launch a Gradio app (web interface)
|
|
|
72 |
- Basic TTS with Chunk Inference
|
73 |
- Multi-Style / Multi-Speaker Generation
|
74 |
- Voice Chat powered by Qwen2.5-3B-Instruct
|
75 |
+
- [Custom model](src/f5_tts/infer/SHARED.md) inference (local only)
|
76 |
|
77 |
```bash
|
78 |
# Launch a Gradio app (web interface)
|
src/f5_tts/infer/README.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2 |
|
3 |
The pretrained model checkpoints can be reached at [🤗 Hugging Face](https://huggingface.co/SWivid/F5-TTS) and [🤖 Model Scope](https://www.modelscope.cn/models/SWivid/F5-TTS_Emilia-ZH-EN), or will be automatically downloaded when running inference scripts.
|
4 |
|
5 |
-
More checkpoints with whole community efforts can be found [
|
6 |
|
7 |
Currently support **30s for a single** generation, which is the **total length** including both prompt and output audio. However, you can provide `infer_cli` and `infer_gradio` with longer text, will automatically do chunk generation. Long reference audio will be **clip short to ~15s**.
|
8 |
|
|
|
2 |
|
3 |
The pretrained model checkpoints can be reached at [🤗 Hugging Face](https://huggingface.co/SWivid/F5-TTS) and [🤖 Model Scope](https://www.modelscope.cn/models/SWivid/F5-TTS_Emilia-ZH-EN), or will be automatically downloaded when running inference scripts.
|
4 |
|
5 |
+
**More checkpoints with whole community efforts can be found in [SHARED.md](SHARED.md), supporting more languages.**
|
6 |
|
7 |
Currently support **30s for a single** generation, which is the **total length** including both prompt and output audio. However, you can provide `infer_cli` and `infer_gradio` with longer text, will automatically do chunk generation. Long reference audio will be **clip short to ~15s**.
|
8 |
|
src/f5_tts/infer/SHARED.md
CHANGED
@@ -4,6 +4,7 @@
|
|
4 |
- This document is serving as a quick lookup table for the community training/finetuning result, with various language support.
|
5 |
- The models in this repository are open source and are based on voluntary contributions from contributors.
|
6 |
- The use of models must be conditioned on respect for the respective creators. The convenience brought comes from their efforts.
|
|
|
7 |
|
8 |
|
9 |
<!-- omit in toc -->
|
@@ -25,7 +26,7 @@
|
|
25 |
MODEL_CKPT: hf://SWivid/F5-TTS/F5TTS_Base/model_1200000.safetensors
|
26 |
VOCAB_FILE: hf://SWivid/F5-TTS/F5TTS_Base/vocab.txt
|
27 |
```
|
28 |
-
|
29 |
|
30 |
### Mandarin
|
31 |
|
|
|
4 |
- This document is serving as a quick lookup table for the community training/finetuning result, with various language support.
|
5 |
- The models in this repository are open source and are based on voluntary contributions from contributors.
|
6 |
- The use of models must be conditioned on respect for the respective creators. The convenience brought comes from their efforts.
|
7 |
+
- Welcome to pull request sharing your result here.
|
8 |
|
9 |
|
10 |
<!-- omit in toc -->
|
|
|
26 |
MODEL_CKPT: hf://SWivid/F5-TTS/F5TTS_Base/model_1200000.safetensors
|
27 |
VOCAB_FILE: hf://SWivid/F5-TTS/F5TTS_Base/vocab.txt
|
28 |
```
|
29 |
+
*Other infos, e.g. Link to some sampled results, Github repo, Usage instruction, Tutorial (Blog, Video, etc.) ...*
|
30 |
|
31 |
### Mandarin
|
32 |
|