dangvansam
commited on
Commit
•
1d118f6
1
Parent(s):
a3e9201
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,193 @@
|
|
1 |
---
|
2 |
-
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language:
|
3 |
+
- vi
|
4 |
+
- en
|
5 |
+
pipeline_tag: text-to-speech
|
6 |
---
|
7 |
+
|
8 |
+
<!-- # VietTTS: An Open-Source Vietnamese Text to Speech -->
|
9 |
+
<p align="center">
|
10 |
+
<img src="https://github.com/dangvansam/viet-tts/blob/main/assets/viet-tts-medium.png?raw=true" style="width: 22%">
|
11 |
+
<h1 align="center"style="color: white; font-weight: bold; font-family:roboto"><span style="color: white; font-weight: bold; font-family:roboto">VietTTS</span>: An Open-Source Vietnamese Text to Speech</h1>
|
12 |
+
</p>
|
13 |
+
<p align="center">
|
14 |
+
<a href="https://github.com/dangvansam/viet-tts"><img src="https://img.shields.io/github/stars/dangvansam/viet-tts?style=social"></a>
|
15 |
+
</p>
|
16 |
+
|
17 |
+
**VietTTS** is an open-source toolkit providing the community with a powerful Vietnamese TTS model, capable of natural voice synthesis and robust voice cloning. Designed for effective experimentation, **VietTTS** supports research and application in Vietnamese voice technologies.
|
18 |
+
|
19 |
+
## ⭐ Key Features
|
20 |
+
- **TTS**: Text-to-Speech generation with any voice via prompt audio
|
21 |
+
- **VC**: Voice Conversion (TODO)
|
22 |
+
|
23 |
+
## 🛠️ Installation
|
24 |
+
|
25 |
+
VietTTS can be installed via either a Python installer or Docker.
|
26 |
+
|
27 |
+
### Python Installer
|
28 |
+
```bash
|
29 |
+
git clone https://github.com/dangvansam/viet-tts.git
|
30 |
+
cd viet-tts
|
31 |
+
|
32 |
+
# (Optional) Install Python environment with conda, you could also use virtualenv
|
33 |
+
conda create --name viettts python=3.10
|
34 |
+
conda activate viettts
|
35 |
+
|
36 |
+
# Install
|
37 |
+
pip install -e . && pip cache purge
|
38 |
+
```
|
39 |
+
|
40 |
+
### Docker
|
41 |
+
|
42 |
+
1. Install [Docker](https://docs.docker.com/get-docker/), [NVIDIA Driver](https://www.nvidia.com/download/index.aspx), [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html), and [CUDA](https://developer.nvidia.com/cuda-downloads).
|
43 |
+
|
44 |
+
2. Run the following commands:
|
45 |
+
```bash
|
46 |
+
git clone https://github.com/dangvansam/viet-tts.git
|
47 |
+
cd viet-tts
|
48 |
+
|
49 |
+
# Build docker images
|
50 |
+
docker compose build
|
51 |
+
|
52 |
+
# Run with docker-compose - will create server at: http://localhost:8298
|
53 |
+
docker compose up -d
|
54 |
+
|
55 |
+
# Run with docker run - will create server at: http://localhost:8298
|
56 |
+
docker run -itd --gpu=alls -p 8298:8298 -v ./pretrained-models:/app/pretrained-models -n viet-tts-service viet-tts:latest viettts server --host 0.0.0.0 --port 8298
|
57 |
+
|
58 |
+
# Show available voices
|
59 |
+
docker exec viet-tts-service viettts show-voices
|
60 |
+
```
|
61 |
+
|
62 |
+
## 🚀 Usage
|
63 |
+
|
64 |
+
### Built-in Voices 🤠
|
65 |
+
You can use available voices bellow to synthesize speech.
|
66 |
+
<details>
|
67 |
+
<summary>Expand</summary>
|
68 |
+
|
69 |
+
| ID | Voice | Gender | Play Audio |
|
70 |
+
|-----|-----------------------|--------|--------------------------------------------------|
|
71 |
+
| 1 | nsnd-le-chuc | 👨 | <audio controls src="samples/nsnd-le-chuc.mp3"></audio> |
|
72 |
+
| 2 | speechify_10 | 👩 | <audio controls src="samples/speechify_10.wav"></audio> |
|
73 |
+
| 3 | atuan | 👨 | <audio controls src="samples/atuan.wav"></audio> |
|
74 |
+
| 4 | speechify_11 | 👩 | <audio controls src="samples/speechify_11.wav"></audio> |
|
75 |
+
| 5 | cdteam | 👨 | <audio controls src="samples/cdteam.wav"></audio> |
|
76 |
+
| 6 | speechify_12 | 👩 | <audio controls src="samples/speechify_12.wav"></audio> |
|
77 |
+
| 7 | cross_lingual_prompt | 👩 | <audio controls src="samples/cross_lingual_prompt.wav"></audio> |
|
78 |
+
| 8 | speechify_2 | 👩 | <audio controls src="samples/speechify_2.wav"></audio> |
|
79 |
+
| 9 | diep-chi | 👨 | <audio controls src="samples/diep-chi.wav"></audio> |
|
80 |
+
| 10 | speechify_3 | 👩 | <audio controls src="samples/speechify_3.wav"></audio> |
|
81 |
+
| 11 | doremon | 👨 | <audio controls src="samples/doremon.mp3"></audio> |
|
82 |
+
| 12 | speechify_4 | 👩 | <audio controls src="samples/speechify_4.wav"></audio> |
|
83 |
+
| 13 | jack-sparrow | 👨 | <audio controls src="samples/jack-sparrow.mp3"></audio> |
|
84 |
+
| 14 | speechify_5 | 👩 | <audio controls src="samples/speechify_5.wav"></audio> |
|
85 |
+
| 15 | nguyen-ngoc-ngan | 👩 | <audio controls src="samples/nguyen-ngoc-ngan.wav"></audio> |
|
86 |
+
| 16 | speechify_6 | 👩 | <audio controls src="samples/speechify_6.wav"></audio> |
|
87 |
+
| 17 | nu-nhe-nhang | 👩 | <audio controls src="samples/nu-nhe-nhang.wav"></audio> |
|
88 |
+
| 18 | speechify_7 | 👩 | <audio controls src="samples/speechify_7.wav"></audio> |
|
89 |
+
| 19 | quynh | 👩 | <audio controls src="samples/quynh.wav"></audio> |
|
90 |
+
| 20 | speechify_8 | 👩 | <audio controls src="samples/speechify_8.wav"></audio> |
|
91 |
+
| 21 | speechify_9 | 👩 | <audio controls src="samples/speechify_9.wav"></audio> |
|
92 |
+
| 22 | son-tung-mtp | 👨 | <audio controls src="samples/son-tung-mtp.wav"></audio> |
|
93 |
+
| 23 | zero_shot_prompt | 👩 | <audio controls src="samples/zero_shot_prompt.wav"></audio> |
|
94 |
+
| 24 | speechify_1 | 👩 | <audio controls src="samples/speechify_1.wav"></audio> |
|
95 |
+
|
96 |
+
<div>
|
97 |
+
</div>
|
98 |
+
</details>
|
99 |
+
|
100 |
+
### Command Line Interface (CLI)
|
101 |
+
The VietTTS Command Line Interface (CLI) allows you to quickly generate speech directly from the terminal. Here's how to use it:
|
102 |
+
```bash
|
103 |
+
# Usage
|
104 |
+
viettts --help
|
105 |
+
|
106 |
+
# Start API Server
|
107 |
+
viettts server --host 0.0.0.0 --port 8298
|
108 |
+
|
109 |
+
# Synthesis speech from text
|
110 |
+
viettts synthesis --text "Xin chào" --voice 0 --output test.wav
|
111 |
+
|
112 |
+
# List all built-in voices
|
113 |
+
viettts show-voices
|
114 |
+
```
|
115 |
+
|
116 |
+
### API Client
|
117 |
+
#### Python (OpenAI Client)
|
118 |
+
You need to set environment variables for the OpenAI Client:
|
119 |
+
```bash
|
120 |
+
# Set base_url and API key as environment variables
|
121 |
+
export OPENAI_BASE_URL=http://localhost:8298
|
122 |
+
export OPENAI_API_KEY=viet-tts # not use in current version
|
123 |
+
```
|
124 |
+
To create speech from input text:
|
125 |
+
```python
|
126 |
+
from pathlib import Path
|
127 |
+
from openai import OpenAI
|
128 |
+
|
129 |
+
client = OpenAI()
|
130 |
+
|
131 |
+
output_file_path = Path(__file__).parent / "speech.wav"
|
132 |
+
|
133 |
+
with client.audio.speech.with_streaming_response.create(
|
134 |
+
model='tts-1',
|
135 |
+
voice='cdteam',
|
136 |
+
input='Xin chào Việt Nam.',
|
137 |
+
speed=1.0,
|
138 |
+
response_format='wav'
|
139 |
+
) as response:
|
140 |
+
response.stream_to_file('a.wav')
|
141 |
+
```
|
142 |
+
|
143 |
+
#### CURL
|
144 |
+
```bash
|
145 |
+
curl http://localhost:8298/v1/audio/speech \
|
146 |
+
-H "Authorization: Bearer viet-tts" \
|
147 |
+
-H "Content-Type: application/json" \
|
148 |
+
-d '{
|
149 |
+
"model": "tts-1",
|
150 |
+
"input": "Xin chào Việt Nam.",
|
151 |
+
"voice": "son-tung-mtp"
|
152 |
+
}' \
|
153 |
+
--output speech.wav
|
154 |
+
```
|
155 |
+
|
156 |
+
#### Node
|
157 |
+
```js
|
158 |
+
import fs from "fs";
|
159 |
+
import path from "path";
|
160 |
+
import OpenAI from "openai";
|
161 |
+
|
162 |
+
const openai = new OpenAI();
|
163 |
+
|
164 |
+
const speechFile = path.resolve("./speech.wav");
|
165 |
+
|
166 |
+
async function main() {
|
167 |
+
const mp3 = await openai.audio.speech.create({
|
168 |
+
model: "tts-1",
|
169 |
+
voice: "1",
|
170 |
+
input: "Xin chào Việt Nam.",
|
171 |
+
});
|
172 |
+
console.log(speechFile);
|
173 |
+
const buffer = Buffer.from(await mp3.arrayBuffer());
|
174 |
+
await fs.promises.writeFile(speechFile, buffer);
|
175 |
+
}
|
176 |
+
main();
|
177 |
+
```
|
178 |
+
|
179 |
+
## 🙏 Acknowledgement
|
180 |
+
- 💡 Borrowed code from [Cosyvoice](https://github.com/FunAudioLLM/CosyVoice)
|
181 |
+
- 🎙️ VAD model from [silero-vad](https://github.com/snakers4/silero-vad)
|
182 |
+
- 📝 Text normalization with [Vinorm](https://github.com/v-nhandt21/Vinorm)
|
183 |
+
|
184 |
+
## 📜 License
|
185 |
+
The **VietTTS** source code is released under the **Apache 2.0 License**. Pre-trained models and audio samples are licensed under the **CC BY-NC License**, based on an in-the-wild dataset. We apologize for any inconvenience this may cause.
|
186 |
+
|
187 |
+
## ⚠️ Disclaimer
|
188 |
+
The content provided above is for academic purposes only and is intended to demonstrate technical capabilities. Some examples are sourced from the internet. If any content infringes on your rights, please contact us to request its removal.
|
189 |
+
|
190 |
+
## 💬 Contact
|
191 |
+
- Facebook: https://fb.com/sam.rngd
|
192 |
+
- GitHub: https://github.com/dangvansam
|
193 |
+
- Email: dangvansam98@gmail.com
|