Hilley's picture
Upload 3 files
2fe2568 verified
|
raw
history blame
5.06 kB
metadata
license: mit
language:
  - ko
pipeline_tag: text-to-speech

MeloTTS

MeloTTS is a high-quality multi-lingual text-to-speech library by MyShell.ai. Supported languages include:

Model card Example
English (American) Link
English (British) Link
English (Indian) Link
English (Australian) Link
English (Default) Link
Spanish Link
French Link
Chinese (mix EN) Link
Japanese Link
Korean Link

Some other features include:

  • The Chinese speaker supports mixed Chinese and English.
  • Fast enough for CPU real-time inference.

Usage

Without Installation

An unofficial live demo is hosted on Hugging Face Spaces.

Use it on MyShell

There are hundreds of TTS models on MyShell, much more than MeloTTS. See examples here. More can be found at the widget center of MyShell.ai.

Install and Use Locally

Follow the installation steps here before using the following snippet:

from melo.api import TTS

# Speed is adjustable
speed = 1.0

# CPU is sufficient for real-time inference.
# You can set it manually to 'cpu' or 'cuda' or 'cuda:0' or 'mps'
device = 'auto' # Will automatically use GPU if available

# English 
text = "Did you ever hear a folk tale about a giant turtle?"
model = TTS(language='EN_V2', device=device)
speaker_ids = model.hps.data.spk2id

# American accent
output_path = 'en-us.wav'
model.tts_to_file(text, speaker_ids['EN-US'], output_path, speed=speed)

# British accent
output_path = 'en-br.wav'
model.tts_to_file(text, speaker_ids['EN-BR'], output_path, speed=speed)

# Indian accent
output_path = 'en-india.wav'
model.tts_to_file(text, speaker_ids['EN_INDIA'], output_path, speed=speed)

# Australian accent
output_path = 'en-au.wav'
model.tts_to_file(text, speaker_ids['EN-AU'], output_path, speed=speed)

# Default accent
output_path = 'en-default.wav'
model.tts_to_file(text, speaker_ids['EN-Default'], output_path, speed=speed)

Join the Community

Open Source AI Grant

We are actively sponsoring open-source AI projects. The sponsorship includes GPU resources, fundings and intellectual support (collaboration with top research labs). We welcome both reseach and engineering projects, as long as the open-source community needs them. Please contact Zengyi Qin if you are interested.

Contributing

If you find this work useful, please consider contributing to the GitHub repo.

  • Many thanks to @fakerybakery for adding the Web UI and CLI part.

License

This library is under MIT License, which means it is free for both commercial and non-commercial use.

Acknowledgements

This implementation is based on TTS, VITS, VITS2 and Bert-VITS2. We appreciate their awesome work.