mrfakename commited on
Commit
811c6c1
1 Parent(s): 7804f9c

Sync from GitHub repo

Browse files

This Space is synced from the GitHub repo: https://github.com/SWivid/F5-TTS. Please submit contributions to the Space there

README_REPO.md CHANGED
@@ -72,6 +72,7 @@ Currently supported features:
72
  - Basic TTS with Chunk Inference
73
  - Multi-Style / Multi-Speaker Generation
74
  - Voice Chat powered by Qwen2.5-3B-Instruct
 
75
 
76
  ```bash
77
  # Launch a Gradio app (web interface)
 
72
  - Basic TTS with Chunk Inference
73
  - Multi-Style / Multi-Speaker Generation
74
  - Voice Chat powered by Qwen2.5-3B-Instruct
75
+ - [Custom model](src/f5_tts/infer/SHARED.md) inference (local only)
76
 
77
  ```bash
78
  # Launch a Gradio app (web interface)
src/f5_tts/infer/README.md CHANGED
@@ -2,7 +2,7 @@
2
 
3
  The pretrained model checkpoints can be reached at [🤗 Hugging Face](https://huggingface.co/SWivid/F5-TTS) and [🤖 Model Scope](https://www.modelscope.cn/models/SWivid/F5-TTS_Emilia-ZH-EN), or will be automatically downloaded when running inference scripts.
4
 
5
- More checkpoints with whole community efforts can be found [here](src/f5_tts/infer/SHARED.md), supporting more languages.
6
 
7
  Currently support **30s for a single** generation, which is the **total length** including both prompt and output audio. However, you can provide `infer_cli` and `infer_gradio` with longer text, will automatically do chunk generation. Long reference audio will be **clip short to ~15s**.
8
 
 
2
 
3
  The pretrained model checkpoints can be reached at [🤗 Hugging Face](https://huggingface.co/SWivid/F5-TTS) and [🤖 Model Scope](https://www.modelscope.cn/models/SWivid/F5-TTS_Emilia-ZH-EN), or will be automatically downloaded when running inference scripts.
4
 
5
+ **More checkpoints with whole community efforts can be found in [SHARED.md](SHARED.md), supporting more languages.**
6
 
7
  Currently support **30s for a single** generation, which is the **total length** including both prompt and output audio. However, you can provide `infer_cli` and `infer_gradio` with longer text, will automatically do chunk generation. Long reference audio will be **clip short to ~15s**.
8
 
src/f5_tts/infer/SHARED.md CHANGED
@@ -4,6 +4,7 @@
4
  - This document is serving as a quick lookup table for the community training/finetuning result, with various language support.
5
  - The models in this repository are open source and are based on voluntary contributions from contributors.
6
  - The use of models must be conditioned on respect for the respective creators. The convenience brought comes from their efforts.
 
7
 
8
 
9
  <!-- omit in toc -->
@@ -25,7 +26,7 @@
25
  MODEL_CKPT: hf://SWivid/F5-TTS/F5TTS_Base/model_1200000.safetensors
26
  VOCAB_FILE: hf://SWivid/F5-TTS/F5TTS_Base/vocab.txt
27
  ```
28
- ***Other infos, e.g. Github Repo, Usage Instruction, Tutorial (Blog, Video, etc.) ...***
29
 
30
  ### Mandarin
31
 
 
4
  - This document is serving as a quick lookup table for the community training/finetuning result, with various language support.
5
  - The models in this repository are open source and are based on voluntary contributions from contributors.
6
  - The use of models must be conditioned on respect for the respective creators. The convenience brought comes from their efforts.
7
+ - Welcome to pull request sharing your result here.
8
 
9
 
10
  <!-- omit in toc -->
 
26
  MODEL_CKPT: hf://SWivid/F5-TTS/F5TTS_Base/model_1200000.safetensors
27
  VOCAB_FILE: hf://SWivid/F5-TTS/F5TTS_Base/vocab.txt
28
  ```
29
+ *Other infos, e.g. Link to some sampled results, Github repo, Usage instruction, Tutorial (Blog, Video, etc.) ...*
30
 
31
  ### Mandarin
32