--- title: Amphion Vevo Voice Conversion emoji: 🎤 colorFrom: indigo colorTo: purple sdk: gradio sdk_version: 4.8.0 app_file: app.py pinned: false python_version: "3.10" --- # Amphion's Vevo - Voice Conversion & TTS This is a Gradio web interface for the Vevo voice conversion model from the Amphion toolkit. It supports: - Voice conversion (transferring both style and timbre) - Timbre-only conversion - Text-to-Speech with voice cloning ## Usage 1. Select mode: - **Voice**: Convert voice with both style and timbre transfer - **Timbre**: Convert only the timbre of the voice - **TTS**: Generate speech from text with voice cloning 2. Upload audio files based on mode: - Source Audio: Your input audio (for voice and timbre modes) - Reference Style: Style reference (for voice and TTS modes) - Reference Timbre: Voice reference (required for all modes) 3. For TTS mode: - Enter the text you want to convert to speech - Optionally provide reference text - Select source and reference languages 4. Adjust Flow Matching Steps (1-64, default: 32) - Higher values give better quality but take longer - Lower values are faster but may reduce quality 5. Click "Generate" to create the converted audio ## Sample Files Sample audio files are available in the `Amphion/models/vc/vevo/wav/` directory: - arabic_male.wav - source.wav ## Technical Requirements - Python 3.10+ - CUDA-capable GPU recommended for faster inference - Minimum 12GB storage space for models ## Models The application automatically downloads required models from Hugging Face: - Content Tokenizer (vq32) - Content-Style Tokenizer (vq8192) - Autoregressive Transformer - Flow Matching Transformer - Vocoder