Commit
·
0feb3fd
1
Parent(s):
a91e656
Update README.md (#1)
Browse files- Update README.md (b62e72bf0928bcf6c7ad418a4204007f0d8d7b1d)
Co-authored-by: Vaibhav Srivastav <reach-vb@users.noreply.huggingface.co>
README.md
CHANGED
@@ -1,3 +1,71 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
4 |
+
|
5 |
+
# Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
|
6 |
+
|
7 |
+
[Audio samples](https://charactr-platform.github.io/vocos/) |
|
8 |
+
Paper [[abs]](https://arxiv.org/abs/2306.00814) [[pdf]](https://arxiv.org/pdf/2306.00814.pdf)
|
9 |
+
|
10 |
+
Vocos is a fast neural vocoder designed to synthesize audio waveforms from acoustic features. Trained using a Generative
|
11 |
+
Adversarial Network (GAN) objective, Vocos can generate waveforms in a single forward pass. Unlike other typical
|
12 |
+
GAN-based vocoders, Vocos does not model audio samples in the time domain. Instead, it generates spectral
|
13 |
+
coefficients, facilitating rapid audio reconstruction through inverse Fourier transform.
|
14 |
+
|
15 |
+
## Installation
|
16 |
+
|
17 |
+
To use Vocos only in inference mode, install it using:
|
18 |
+
|
19 |
+
```bash
|
20 |
+
pip install vocos
|
21 |
+
```
|
22 |
+
|
23 |
+
If you wish to train the model, install it with additional dependencies:
|
24 |
+
|
25 |
+
```bash
|
26 |
+
pip install vocos[train]
|
27 |
+
```
|
28 |
+
|
29 |
+
## Usage
|
30 |
+
|
31 |
+
### Reconstruct audio from mel-spectrogram
|
32 |
+
|
33 |
+
```python
|
34 |
+
import torch
|
35 |
+
|
36 |
+
from vocos import Vocos
|
37 |
+
|
38 |
+
vocos = Vocos.from_pretrained("charactr/vocos-mel-24khz")
|
39 |
+
|
40 |
+
mel = torch.randn(1, 100, 256) # B, C, T
|
41 |
+
audio = vocos.decode(mel)
|
42 |
+
```
|
43 |
+
|
44 |
+
Copy-synthesis from a file:
|
45 |
+
|
46 |
+
```python
|
47 |
+
import torchaudio
|
48 |
+
|
49 |
+
y, sr = torchaudio.load(YOUR_AUDIO_FILE)
|
50 |
+
if y.size(0) > 1: # mix to mono
|
51 |
+
y = y.mean(dim=0, keepdim=True)
|
52 |
+
y = torchaudio.functional.resample(y, orig_freq=sr, new_freq=24000)
|
53 |
+
y_hat = vocos(y)
|
54 |
+
```
|
55 |
+
|
56 |
+
## Citation
|
57 |
+
|
58 |
+
If this code contributes to your research, please cite our work:
|
59 |
+
|
60 |
+
```
|
61 |
+
@article{siuzdak2023vocos,
|
62 |
+
title={Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis},
|
63 |
+
author={Siuzdak, Hubert},
|
64 |
+
journal={arXiv preprint arXiv:2306.00814},
|
65 |
+
year={2023}
|
66 |
+
}
|
67 |
+
```
|
68 |
+
|
69 |
+
## License
|
70 |
+
|
71 |
+
The code in this repository is released under the MIT license.
|