Spaces:
Paused
Paused
Update README.md
Browse files
README.md
CHANGED
@@ -1,12 +1,13 @@
|
|
1 |
---
|
2 |
-
title: Voice Conversion
|
3 |
emoji: 🎤
|
4 |
colorFrom: indigo
|
5 |
colorTo: purple
|
6 |
sdk: gradio
|
7 |
-
sdk_version:
|
8 |
app_file: app.py
|
9 |
pinned: false
|
|
|
10 |
---
|
11 |
|
12 |
# Amphion's Vevo - Voice Conversion & TTS
|
@@ -19,27 +20,44 @@ This is a Gradio web interface for the Vevo voice conversion model from the Amph
|
|
19 |
|
20 |
## Usage
|
21 |
|
22 |
-
1. Select
|
23 |
-
|
24 |
-
-
|
25 |
-
-
|
26 |
-
|
|
|
|
|
|
|
|
|
|
|
27 |
3. For TTS mode:
|
28 |
- Enter the text you want to convert to speech
|
29 |
-
- Optionally provide reference text
|
30 |
-
|
|
|
|
|
|
|
|
|
|
|
31 |
5. Click "Generate" to create the converted audio
|
32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
## Models
|
34 |
|
35 |
-
The application
|
36 |
- Content Tokenizer (vq32)
|
37 |
- Content-Style Tokenizer (vq8192)
|
38 |
- Autoregressive Transformer
|
39 |
- Flow Matching Transformer
|
40 |
-
- Vocoder
|
41 |
-
|
42 |
-
## Technical Requirements
|
43 |
-
|
44 |
-
- Python 3.8+
|
45 |
-
- CUDA-capable GPU recommended for faster inference
|
|
|
1 |
---
|
2 |
+
title: Amphion Vevo Voice Conversion
|
3 |
emoji: 🎤
|
4 |
colorFrom: indigo
|
5 |
colorTo: purple
|
6 |
sdk: gradio
|
7 |
+
sdk_version: 4.8.0
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
+
python_version: "3.10"
|
11 |
---
|
12 |
|
13 |
# Amphion's Vevo - Voice Conversion & TTS
|
|
|
20 |
|
21 |
## Usage
|
22 |
|
23 |
+
1. Select mode:
|
24 |
+
- **Voice**: Convert voice with both style and timbre transfer
|
25 |
+
- **Timbre**: Convert only the timbre of the voice
|
26 |
+
- **TTS**: Generate speech from text with voice cloning
|
27 |
+
|
28 |
+
2. Upload audio files based on mode:
|
29 |
+
- Source Audio: Your input audio (for voice and timbre modes)
|
30 |
+
- Reference Style: Style reference (for voice and TTS modes)
|
31 |
+
- Reference Timbre: Voice reference (required for all modes)
|
32 |
+
|
33 |
3. For TTS mode:
|
34 |
- Enter the text you want to convert to speech
|
35 |
+
- Optionally provide reference text
|
36 |
+
- Select source and reference languages
|
37 |
+
|
38 |
+
4. Adjust Flow Matching Steps (1-64, default: 32)
|
39 |
+
- Higher values give better quality but take longer
|
40 |
+
- Lower values are faster but may reduce quality
|
41 |
+
|
42 |
5. Click "Generate" to create the converted audio
|
43 |
|
44 |
+
## Sample Files
|
45 |
+
|
46 |
+
Sample audio files are available in the `Amphion/models/vc/vevo/wav/` directory:
|
47 |
+
- arabic_male.wav
|
48 |
+
- source.wav
|
49 |
+
|
50 |
+
## Technical Requirements
|
51 |
+
|
52 |
+
- Python 3.10+
|
53 |
+
- CUDA-capable GPU recommended for faster inference
|
54 |
+
- Minimum 12GB storage space for models
|
55 |
+
|
56 |
## Models
|
57 |
|
58 |
+
The application automatically downloads required models from Hugging Face:
|
59 |
- Content Tokenizer (vq32)
|
60 |
- Content-Style Tokenizer (vq8192)
|
61 |
- Autoregressive Transformer
|
62 |
- Flow Matching Transformer
|
63 |
+
- Vocoder
|
|
|
|
|
|
|
|
|
|