SazerLife commited on
Commit
38a69d2
·
1 Parent(s): 36a67ca

doc: added Readme

Browse files
Files changed (1) hide show
  1. README.md +39 -1
README.md CHANGED
@@ -3,4 +3,42 @@ license: mit
3
  language:
4
  - ru
5
  pipeline_tag: audio-to-audio
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  language:
4
  - ru
5
  pipeline_tag: audio-to-audio
6
+ ---
7
+
8
+ # DINO-HuVITS
9
+
10
+ ## Info
11
+ Разработка данной модели вдохновлена статьёй [DINO-VITS](https://arxiv.org/abs/2311.09770).
12
+ В основе лежит архитектура VITS, в которой оригинальный `PosteriorEncoder` был заменён на модель [HuBERT Base](https://arxiv.org/abs/2106.07447), а обучение `SpeakerEncoder` происходило с помощью функции потерь [DINO](https://arxiv.org/abs/2304.05754).
13
+
14
+ ## Quick start
15
+
16
+ ```python
17
+ import librosa
18
+ import torch
19
+
20
+ from dino_huvits import DinoHuVits
21
+
22
+
23
+ model = DinoHuVits.from_pretrained("SazerLife/DINO-HuVITS")
24
+ model = model.eval()
25
+
26
+ content, _ = librosa.load("<content-path>", sr=16000)
27
+ reference, _ = librosa.load("<reference-paht>", sr=16000)
28
+
29
+ content = torch.from_numpy(content).unsqueeze(0)
30
+ lengths = torch.tensor([content.shape[1]], dtype=torch.long)
31
+ reference = torch.from_numpy(reference).unsqueeze(0)
32
+
33
+ with torch.no_grad():
34
+ output, _ = model(content, lengths, reference)
35
+ ```
36
+
37
+ ## Datasets
38
+ - [Common Voice](https://commonvoice.mozilla.org/ru)
39
+ - [VoxForge](https://github.com/vlomme/Multi-Tacotron-Voice-Cloning)
40
+ - [M-AILABS](https://github.com/vlomme/Multi-Tacotron-Voice-Cloning)
41
+ - [VoxTube](https://github.com/IDRnD/VoxTube)
42
+ - [Golos](https://github.com/sberdevices/golos)
43
+ - [OpenSTT](https://github.com/snakers4/open_stt)
44
+ - [Sova](https://github.com/sovaai/sova-dataset)