metadata
language:
- ru
tags:
- vits
license: cc-by-nc-4.0
pipeline_tag: text-to-speech
widget:
- example_title: text to speech
text: |
прив+ет, как дел+а? всё +очень хорош+о! а у тебя как?
VITS model Text to Speech Russian
The text accepts lowercase
Example Text to Speech
from transformers import VitsModel, AutoTokenizer
import torch
import scipy
model = VitsModel.from_pretrained("joefox/tts_vits_ru_hf")
tokenizer = AutoTokenizer.from_pretrained("joefox/tts_vits_ru_hf")
text = "Привет, как дел+а? Всё +очень хорош+о! А у тебя как?"
text = text.lower()
inputs = tokenizer(text, return_tensors="pt")
inputs['speaker_id'] = 3
with torch.no_grad():
output = model(**inputs).waveform
scipy.io.wavfile.write("techno.wav", rate=model.config.sampling_rate, data=output[0].cpu().numpy())
For displayed in a Jupyter Notebook / Google Colab:
from IPython.display import Audio
Audio(output, rate=model.config.sampling_rate)
Languages covered
Russian (ru_RU)