dangvansam commited on
Commit
1d118f6
1 Parent(s): a3e9201

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +191 -1
README.md CHANGED
@@ -1,3 +1,193 @@
1
  ---
2
- license: apache-2.0
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - vi
4
+ - en
5
+ pipeline_tag: text-to-speech
6
  ---
7
+
8
+ <!-- # VietTTS: An Open-Source Vietnamese Text to Speech -->
9
+ <p align="center">
10
+ <img src="https://github.com/dangvansam/viet-tts/blob/main/assets/viet-tts-medium.png?raw=true" style="width: 22%">
11
+ <h1 align="center"style="color: white; font-weight: bold; font-family:roboto"><span style="color: white; font-weight: bold; font-family:roboto">VietTTS</span>: An Open-Source Vietnamese Text to Speech</h1>
12
+ </p>
13
+ <p align="center">
14
+ <a href="https://github.com/dangvansam/viet-tts"><img src="https://img.shields.io/github/stars/dangvansam/viet-tts?style=social"></a>
15
+ </p>
16
+
17
+ **VietTTS** is an open-source toolkit providing the community with a powerful Vietnamese TTS model, capable of natural voice synthesis and robust voice cloning. Designed for effective experimentation, **VietTTS** supports research and application in Vietnamese voice technologies.
18
+
19
+ ## ⭐ Key Features
20
+ - **TTS**: Text-to-Speech generation with any voice via prompt audio
21
+ - **VC**: Voice Conversion (TODO)
22
+
23
+ ## 🛠️ Installation
24
+
25
+ VietTTS can be installed via either a Python installer or Docker.
26
+
27
+ ### Python Installer
28
+ ```bash
29
+ git clone https://github.com/dangvansam/viet-tts.git
30
+ cd viet-tts
31
+
32
+ # (Optional) Install Python environment with conda, you could also use virtualenv
33
+ conda create --name viettts python=3.10
34
+ conda activate viettts
35
+
36
+ # Install
37
+ pip install -e . && pip cache purge
38
+ ```
39
+
40
+ ### Docker
41
+
42
+ 1. Install [Docker](https://docs.docker.com/get-docker/), [NVIDIA Driver](https://www.nvidia.com/download/index.aspx), [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html), and [CUDA](https://developer.nvidia.com/cuda-downloads).
43
+
44
+ 2. Run the following commands:
45
+ ```bash
46
+ git clone https://github.com/dangvansam/viet-tts.git
47
+ cd viet-tts
48
+
49
+ # Build docker images
50
+ docker compose build
51
+
52
+ # Run with docker-compose - will create server at: http://localhost:8298
53
+ docker compose up -d
54
+
55
+ # Run with docker run - will create server at: http://localhost:8298
56
+ docker run -itd --gpu=alls -p 8298:8298 -v ./pretrained-models:/app/pretrained-models -n viet-tts-service viet-tts:latest viettts server --host 0.0.0.0 --port 8298
57
+
58
+ # Show available voices
59
+ docker exec viet-tts-service viettts show-voices
60
+ ```
61
+
62
+ ## 🚀 Usage
63
+
64
+ ### Built-in Voices 🤠
65
+ You can use available voices bellow to synthesize speech.
66
+ <details>
67
+ <summary>Expand</summary>
68
+
69
+ | ID | Voice | Gender | Play Audio |
70
+ |-----|-----------------------|--------|--------------------------------------------------|
71
+ | 1 | nsnd-le-chuc | 👨 | <audio controls src="samples/nsnd-le-chuc.mp3"></audio> |
72
+ | 2 | speechify_10 | 👩 | <audio controls src="samples/speechify_10.wav"></audio> |
73
+ | 3 | atuan | 👨 | <audio controls src="samples/atuan.wav"></audio> |
74
+ | 4 | speechify_11 | 👩 | <audio controls src="samples/speechify_11.wav"></audio> |
75
+ | 5 | cdteam | 👨 | <audio controls src="samples/cdteam.wav"></audio> |
76
+ | 6 | speechify_12 | 👩 | <audio controls src="samples/speechify_12.wav"></audio> |
77
+ | 7 | cross_lingual_prompt | 👩 | <audio controls src="samples/cross_lingual_prompt.wav"></audio> |
78
+ | 8 | speechify_2 | 👩 | <audio controls src="samples/speechify_2.wav"></audio> |
79
+ | 9 | diep-chi | 👨 | <audio controls src="samples/diep-chi.wav"></audio> |
80
+ | 10 | speechify_3 | 👩 | <audio controls src="samples/speechify_3.wav"></audio> |
81
+ | 11 | doremon | 👨 | <audio controls src="samples/doremon.mp3"></audio> |
82
+ | 12 | speechify_4 | 👩 | <audio controls src="samples/speechify_4.wav"></audio> |
83
+ | 13 | jack-sparrow | 👨 | <audio controls src="samples/jack-sparrow.mp3"></audio> |
84
+ | 14 | speechify_5 | 👩 | <audio controls src="samples/speechify_5.wav"></audio> |
85
+ | 15 | nguyen-ngoc-ngan | 👩 | <audio controls src="samples/nguyen-ngoc-ngan.wav"></audio> |
86
+ | 16 | speechify_6 | 👩 | <audio controls src="samples/speechify_6.wav"></audio> |
87
+ | 17 | nu-nhe-nhang | 👩 | <audio controls src="samples/nu-nhe-nhang.wav"></audio> |
88
+ | 18 | speechify_7 | 👩 | <audio controls src="samples/speechify_7.wav"></audio> |
89
+ | 19 | quynh | 👩 | <audio controls src="samples/quynh.wav"></audio> |
90
+ | 20 | speechify_8 | 👩 | <audio controls src="samples/speechify_8.wav"></audio> |
91
+ | 21 | speechify_9 | 👩 | <audio controls src="samples/speechify_9.wav"></audio> |
92
+ | 22 | son-tung-mtp | 👨 | <audio controls src="samples/son-tung-mtp.wav"></audio> |
93
+ | 23 | zero_shot_prompt | 👩 | <audio controls src="samples/zero_shot_prompt.wav"></audio> |
94
+ | 24 | speechify_1 | 👩 | <audio controls src="samples/speechify_1.wav"></audio> |
95
+
96
+ <div>
97
+ </div>
98
+ </details>
99
+
100
+ ### Command Line Interface (CLI)
101
+ The VietTTS Command Line Interface (CLI) allows you to quickly generate speech directly from the terminal. Here's how to use it:
102
+ ```bash
103
+ # Usage
104
+ viettts --help
105
+
106
+ # Start API Server
107
+ viettts server --host 0.0.0.0 --port 8298
108
+
109
+ # Synthesis speech from text
110
+ viettts synthesis --text "Xin chào" --voice 0 --output test.wav
111
+
112
+ # List all built-in voices
113
+ viettts show-voices
114
+ ```
115
+
116
+ ### API Client
117
+ #### Python (OpenAI Client)
118
+ You need to set environment variables for the OpenAI Client:
119
+ ```bash
120
+ # Set base_url and API key as environment variables
121
+ export OPENAI_BASE_URL=http://localhost:8298
122
+ export OPENAI_API_KEY=viet-tts # not use in current version
123
+ ```
124
+ To create speech from input text:
125
+ ```python
126
+ from pathlib import Path
127
+ from openai import OpenAI
128
+
129
+ client = OpenAI()
130
+
131
+ output_file_path = Path(__file__).parent / "speech.wav"
132
+
133
+ with client.audio.speech.with_streaming_response.create(
134
+ model='tts-1',
135
+ voice='cdteam',
136
+ input='Xin chào Việt Nam.',
137
+ speed=1.0,
138
+ response_format='wav'
139
+ ) as response:
140
+ response.stream_to_file('a.wav')
141
+ ```
142
+
143
+ #### CURL
144
+ ```bash
145
+ curl http://localhost:8298/v1/audio/speech \
146
+ -H "Authorization: Bearer viet-tts" \
147
+ -H "Content-Type: application/json" \
148
+ -d '{
149
+ "model": "tts-1",
150
+ "input": "Xin chào Việt Nam.",
151
+ "voice": "son-tung-mtp"
152
+ }' \
153
+ --output speech.wav
154
+ ```
155
+
156
+ #### Node
157
+ ```js
158
+ import fs from "fs";
159
+ import path from "path";
160
+ import OpenAI from "openai";
161
+
162
+ const openai = new OpenAI();
163
+
164
+ const speechFile = path.resolve("./speech.wav");
165
+
166
+ async function main() {
167
+ const mp3 = await openai.audio.speech.create({
168
+ model: "tts-1",
169
+ voice: "1",
170
+ input: "Xin chào Việt Nam.",
171
+ });
172
+ console.log(speechFile);
173
+ const buffer = Buffer.from(await mp3.arrayBuffer());
174
+ await fs.promises.writeFile(speechFile, buffer);
175
+ }
176
+ main();
177
+ ```
178
+
179
+ ## 🙏 Acknowledgement
180
+ - 💡 Borrowed code from [Cosyvoice](https://github.com/FunAudioLLM/CosyVoice)
181
+ - 🎙️ VAD model from [silero-vad](https://github.com/snakers4/silero-vad)
182
+ - 📝 Text normalization with [Vinorm](https://github.com/v-nhandt21/Vinorm)
183
+
184
+ ## 📜 License
185
+ The **VietTTS** source code is released under the **Apache 2.0 License**. Pre-trained models and audio samples are licensed under the **CC BY-NC License**, based on an in-the-wild dataset. We apologize for any inconvenience this may cause.
186
+
187
+ ## ⚠️ Disclaimer
188
+ The content provided above is for academic purposes only and is intended to demonstrate technical capabilities. Some examples are sourced from the internet. If any content infringes on your rights, please contact us to request its removal.
189
+
190
+ ## 💬 Contact
191
+ - Facebook: https://fb.com/sam.rngd
192
+ - GitHub: https://github.com/dangvansam
193
+ - Email: dangvansam98@gmail.com