metadata

language:
  - ru
tags:
  - vits
license: cc-by-nc-4.0
pipeline_tag: text-to-speech
widget:
  - example_title: text to speech
    text: |
      прив+ет, как дел+а? всё +очень хорош+о! а у тебя как?

VITS model Text to Speech Russian

The text accepts lowercase

Example Text to Speech

from transformers import VitsModel, AutoTokenizer
import torch
import scipy

model = VitsModel.from_pretrained("joefox/tts_vits_ru_hf")
tokenizer = AutoTokenizer.from_pretrained("joefox/tts_vits_ru_hf")

text = "Привет, как дел+а? Всё +очень хорош+о! А у тебя как?"
text = text.lower()
inputs = tokenizer(text, return_tensors="pt")
inputs['speaker_id'] = 3

with torch.no_grad():
    output = model(**inputs).waveform
    
scipy.io.wavfile.write("techno.wav", rate=model.config.sampling_rate, data=output[0].cpu().numpy())

For displayed in a Jupyter Notebook / Google Colab:

from IPython.display import Audio

Audio(output, rate=model.config.sampling_rate)

Languages covered

Russian (ru_RU)