SazerLife
/

DINO-HuVITS

VoiceConversion

Inference Endpoints

Model card Files Files and versions Community

SazerLife commited on Jun 13, 2024

Commit

38a69d2

·

1 Parent(s): 36a67ca

doc: added Readme

Files changed (1) hide show

README.md +39 -1

README.md CHANGED Viewed

@@ -3,4 +3,42 @@ license: mit
 language:
 - ru
 pipeline_tag: audio-to-audio
----

 language:
 - ru
 pipeline_tag: audio-to-audio
+---
+# DINO-HuVITS
+## Info
+Разработка данной модели вдохновлена статьёй [DINO-VITS](https://arxiv.org/abs/2311.09770).
+В основе лежит архитектура VITS, в которой оригинальный `PosteriorEncoder` был заменён на модель [HuBERT Base](https://arxiv.org/abs/2106.07447), а обучение `SpeakerEncoder` происходило с помощью функции потерь [DINO](https://arxiv.org/abs/2304.05754).
+## Quick start
+```python
+import librosa
+import torch
+from dino_huvits import DinoHuVits
+model = DinoHuVits.from_pretrained("SazerLife/DINO-HuVITS")
+model = model.eval()
+content, _ = librosa.load("<content-path>", sr=16000)
+reference, _ = librosa.load("<reference-paht>", sr=16000)
+content = torch.from_numpy(content).unsqueeze(0)
+lengths = torch.tensor([content.shape[1]], dtype=torch.long)
+reference = torch.from_numpy(reference).unsqueeze(0)
+with torch.no_grad():
+    output, _ = model(content, lengths, reference)
+```
+## Datasets
+- [Common Voice](https://commonvoice.mozilla.org/ru)
+- [VoxForge](https://github.com/vlomme/Multi-Tacotron-Voice-Cloning)
+- [M-AILABS](https://github.com/vlomme/Multi-Tacotron-Voice-Cloning)
+- [VoxTube](https://github.com/IDRnD/VoxTube)
+- [Golos](https://github.com/sberdevices/golos)
+- [OpenSTT](https://github.com/snakers4/open_stt)
+- [Sova](https://github.com/sovaai/sova-dataset)