naonauno commited on
Commit
1b37547
·
verified ·
1 Parent(s): 4ae4a65

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -16
README.md CHANGED
@@ -1,12 +1,13 @@
1
  ---
2
- title: Voice Conversion
3
  emoji: 🎤
4
  colorFrom: indigo
5
  colorTo: purple
6
  sdk: gradio
7
- sdk_version: 5.12.0
8
  app_file: app.py
9
  pinned: false
 
10
  ---
11
 
12
  # Amphion's Vevo - Voice Conversion & TTS
@@ -19,27 +20,44 @@ This is a Gradio web interface for the Vevo voice conversion model from the Amph
19
 
20
  ## Usage
21
 
22
- 1. Select the mode you want to use (voice, timbre, or TTS)
23
- 2. Upload the required audio files:
24
- - Source audio (for voice and timbre modes)
25
- - Reference style audio (for voice and TTS modes)
26
- - Reference timbre audio (for all modes)
 
 
 
 
 
27
  3. For TTS mode:
28
  - Enter the text you want to convert to speech
29
- - Optionally provide reference text and select languages
30
- 4. Adjust the Flow Matching Steps if needed (default: 32)
 
 
 
 
 
31
  5. Click "Generate" to create the converted audio
32
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  ## Models
34
 
35
- The application uses the following models from Hugging Face:
36
  - Content Tokenizer (vq32)
37
  - Content-Style Tokenizer (vq8192)
38
  - Autoregressive Transformer
39
  - Flow Matching Transformer
40
- - Vocoder
41
-
42
- ## Technical Requirements
43
-
44
- - Python 3.8+
45
- - CUDA-capable GPU recommended for faster inference
 
1
  ---
2
+ title: Amphion Vevo Voice Conversion
3
  emoji: 🎤
4
  colorFrom: indigo
5
  colorTo: purple
6
  sdk: gradio
7
+ sdk_version: 4.8.0
8
  app_file: app.py
9
  pinned: false
10
+ python_version: "3.10"
11
  ---
12
 
13
  # Amphion's Vevo - Voice Conversion & TTS
 
20
 
21
  ## Usage
22
 
23
+ 1. Select mode:
24
+ - **Voice**: Convert voice with both style and timbre transfer
25
+ - **Timbre**: Convert only the timbre of the voice
26
+ - **TTS**: Generate speech from text with voice cloning
27
+
28
+ 2. Upload audio files based on mode:
29
+ - Source Audio: Your input audio (for voice and timbre modes)
30
+ - Reference Style: Style reference (for voice and TTS modes)
31
+ - Reference Timbre: Voice reference (required for all modes)
32
+
33
  3. For TTS mode:
34
  - Enter the text you want to convert to speech
35
+ - Optionally provide reference text
36
+ - Select source and reference languages
37
+
38
+ 4. Adjust Flow Matching Steps (1-64, default: 32)
39
+ - Higher values give better quality but take longer
40
+ - Lower values are faster but may reduce quality
41
+
42
  5. Click "Generate" to create the converted audio
43
 
44
+ ## Sample Files
45
+
46
+ Sample audio files are available in the `Amphion/models/vc/vevo/wav/` directory:
47
+ - arabic_male.wav
48
+ - source.wav
49
+
50
+ ## Technical Requirements
51
+
52
+ - Python 3.10+
53
+ - CUDA-capable GPU recommended for faster inference
54
+ - Minimum 12GB storage space for models
55
+
56
  ## Models
57
 
58
+ The application automatically downloads required models from Hugging Face:
59
  - Content Tokenizer (vq32)
60
  - Content-Style Tokenizer (vq8192)
61
  - Autoregressive Transformer
62
  - Flow Matching Transformer
63
+ - Vocoder