Edit model card

VietTTS: An Open-Source Vietnamese Text to Speech

VietTTS is an open-source toolkit providing the community with a powerful Vietnamese TTS model, capable of natural voice synthesis and robust voice cloning. Designed for effective experimentation, VietTTS supports research and application in Vietnamese voice technologies.

โญ Key Features

  • TTS: Text-to-Speech generation with any voice via prompt audio
  • VC: Voice Conversion (TODO)

๐Ÿ› ๏ธ Installation

VietTTS can be installed via either a Python installer or Docker.

Python Installer

git clone https://github.com/dangvansam/viet-tts.git
cd viet-tts

# (Optional) Install Python environment with conda, you could also use virtualenv 
conda create --name viettts python=3.10
conda activate viettts

# Install
pip install -e . && pip cache purge

Docker

  1. Install Docker, NVIDIA Driver, NVIDIA Container Toolkit, and CUDA.

  2. Run the following commands:

git clone https://github.com/dangvansam/viet-tts.git
cd viet-tts

# Build docker images
docker compose build

# Run with docker-compose - will create server at: http://localhost:8298
docker compose up -d

# Run with docker run - will create server at: http://localhost:8298
docker run -itd --gpu=alls -p 8298:8298 -v ./pretrained-models:/app/pretrained-models -n viet-tts-service viet-tts:latest viettts server --host 0.0.0.0 --port 8298

# Show available voices
docker exec viet-tts-service viettts show-voices

๐Ÿš€ Usage

Built-in Voices ๐Ÿค 

You can use available voices bellow to synthesize speech.

Expand
ID Voice Gender Play Audio
1 nsnd-le-chuc ๐Ÿ‘จ
2 speechify_10 ๐Ÿ‘ฉ
3 atuan ๐Ÿ‘จ
4 speechify_11 ๐Ÿ‘ฉ
5 cdteam ๐Ÿ‘จ
6 speechify_12 ๐Ÿ‘ฉ
7 cross_lingual_prompt ๐Ÿ‘ฉ
8 speechify_2 ๐Ÿ‘ฉ
9 diep-chi ๐Ÿ‘จ
10 speechify_3 ๐Ÿ‘ฉ
11 doremon ๐Ÿ‘จ
12 speechify_4 ๐Ÿ‘ฉ
13 jack-sparrow ๐Ÿ‘จ
14 speechify_5 ๐Ÿ‘ฉ
15 nguyen-ngoc-ngan ๐Ÿ‘ฉ
16 speechify_6 ๐Ÿ‘ฉ
17 nu-nhe-nhang ๐Ÿ‘ฉ
18 speechify_7 ๐Ÿ‘ฉ
19 quynh ๐Ÿ‘ฉ
20 speechify_8 ๐Ÿ‘ฉ
21 speechify_9 ๐Ÿ‘ฉ
22 son-tung-mtp ๐Ÿ‘จ
23 zero_shot_prompt ๐Ÿ‘ฉ
24 speechify_1 ๐Ÿ‘ฉ

Command Line Interface (CLI)

The VietTTS Command Line Interface (CLI) allows you to quickly generate speech directly from the terminal. Here's how to use it:

# Usage
viettts --help

# Start API Server
viettts server --host 0.0.0.0 --port 8298

# Synthesis speech from text
viettts synthesis --text "Xin chร o" --voice 0 --output test.wav

# List all built-in voices
viettts show-voices

API Client

Python (OpenAI Client)

You need to set environment variables for the OpenAI Client:

# Set base_url and API key as environment variables
export OPENAI_BASE_URL=http://localhost:8298
export OPENAI_API_KEY=viet-tts # not use in current version

To create speech from input text:

from pathlib import Path
from openai import OpenAI

client = OpenAI()

output_file_path = Path(__file__).parent / "speech.wav"

with client.audio.speech.with_streaming_response.create(
  model='tts-1',
  voice='cdteam',
  input='Xin chร o Viแป‡t Nam.',
  speed=1.0,
  response_format='wav'
) as response:
  response.stream_to_file('a.wav')

CURL

curl http://localhost:8298/v1/audio/speech \
  -H "Authorization: Bearer viet-tts" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Xin chร o Viแป‡t Nam.",
    "voice": "son-tung-mtp"
  }' \
  --output speech.wav

Node

import fs from "fs";
import path from "path";
import OpenAI from "openai";

const openai = new OpenAI();

const speechFile = path.resolve("./speech.wav");

async function main() {
  const mp3 = await openai.audio.speech.create({
    model: "tts-1",
    voice: "1",
    input: "Xin chร o Viแป‡t Nam.",
  });
  console.log(speechFile);
  const buffer = Buffer.from(await mp3.arrayBuffer());
  await fs.promises.writeFile(speechFile, buffer);
}
main();

๐Ÿ™ Acknowledgement

  • ๐Ÿ’ก Borrowed code from Cosyvoice
  • ๐ŸŽ™๏ธ VAD model from silero-vad
  • ๐Ÿ“ Text normalization with Vinorm

๐Ÿ“œ License

The VietTTS source code is released under the Apache 2.0 License. Pre-trained models and audio samples are licensed under the CC BY-NC License, based on an in-the-wild dataset. We apologize for any inconvenience this may cause.

โš ๏ธ Disclaimer

The content provided above is for academic purposes only and is intended to demonstrate technical capabilities. Some examples are sourced from the internet. If any content infringes on your rights, please contact us to request its removal.

๐Ÿ’ฌ Contact

Downloads last month
52
Inference Examples
Unable to determine this model's library. Check the docs .