anteju commited on
Commit
0e8882b
1 Parent(s): 2cbec8f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +112 -3
README.md CHANGED
@@ -1,3 +1,112 @@
1
- ---
2
- license: cc-by-nc-sa-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-sa-4.0
3
+ library_name: NeMo
4
+ tags:
5
+ - NeMo
6
+ - speech
7
+ - audio
8
+ ---
9
+ # SR SSL FlowMatching 16kHz 430M
10
+
11
+ ## Model Overview
12
+
13
+ ### Description
14
+
15
+ This is a generative speech restoration model based on flow matching. The model is pre-trained on a publicly available Libri-Light dataset by using self-supervised learning technique. The model can be finetuned on various speech restoration tasks, such as speech denoising, bandwidth extraction, and codec artifact removal for human or machine listeners.
16
+
17
+ This model is for research and development only.
18
+
19
+ ### License/Terms of Use
20
+ License to use this model is covered by the [CC-BY-NC-SA-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-NC-SA-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0) license.
21
+
22
+ ## References
23
+
24
+ [1] [Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration](https://arxiv.org/abs/2409.16117), 2024.
25
+
26
+ ## Model Architecture
27
+ **Architecture Type:** Conditional Flow Matching <br>
28
+ **Network Architecture:** Transformer <br>
29
+
30
+ ## Input
31
+ **Input Type(s):** Audio <br>
32
+ **Input Format(s):** .wav files <br>
33
+ **Input Parameters:** One-Dimensional (1D) <br>
34
+ **Other Properties Related to Input:** 16000 Hz Mono-channel Audio <br>
35
+
36
+ ## Output
37
+ **Output Type(s):** Audio <br>
38
+ **Output Format:** .wav files <br>
39
+ **Output Parameters:** One-Dimensional (1D) <br>
40
+ **Other Properties Related to Output:** 16000 Hz Mono-channel Audio <br>
41
+
42
+ ## Software Integration
43
+ **Runtime Engine(s):**<br>
44
+ * NeMo-2.0.0 <br>
45
+
46
+ **Supported Hardware Microarchitecture Compatibility:** <br>
47
+ * NVIDIA Ampere<br>
48
+ * NVIDIA Blackwell<br>
49
+ * NVIDIA Jetson<br>
50
+ * NVIDIA Hopper<br>
51
+ * NVIDIA Lovelace<br>
52
+ * NVIDIA Turing<br>
53
+ * NVIDIA Volta<br>
54
+
55
+ **Preferred Operating System(s)** <br>
56
+ * Linux<br>
57
+ * Windows<br>
58
+
59
+ ## Model Version(s)
60
+ `sr_ssl_flowmatching_16k_430m_v1.0`<br>
61
+
62
+ # Training, Testing, and Evaluation Datasets
63
+
64
+ ## Training Dataset
65
+ **Link:**
66
+ [Libri-Light](https://github.com/facebookresearch/libri-light)
67
+
68
+ **Data Collection Method by dataset:** Human <br>
69
+
70
+ **Labeling Method by dataset:** Not Applicable<br>
71
+
72
+ **Properties (Quantity, Dataset Descriptions, Sensor(s)):**
73
+ Approximately 60k hours of English speech data <br>
74
+
75
+ ## Testing Dataset
76
+ **Link:** Not Applicable<br>
77
+
78
+ ## Evaluation Dataset
79
+ **Link:** Not applicable<br>
80
+
81
+ ## Inference
82
+ **Engine:** NeMo 2.0 <br>
83
+
84
+ **Test Hardware:** NVIDIA H100<br>
85
+
86
+ # How to use this model
87
+
88
+ The model is available for use in the NVIDIA NeMo toolkit, and can be used fine-tuning on various speech tasks.
89
+
90
+ ## Load the model
91
+ ```
92
+ from nemo.collections.audio.models import AudioToAudioModel
93
+ model = AudioToAudioModel.from_pretrained('sr_ssl_flowmatching_16k_430m')
94
+ ```
95
+
96
+ ## Change sampler configuration
97
+ ```
98
+ model.sampler.num_steps = 20 # default is 50 steps
99
+ ```
100
+
101
+ ## Finetuning
102
+ For finetuning, use `init_from_nemo_model` to provide a path to a local NeMo model or `init_from_pretrained_model` to download a pretrained NeMo model.
103
+ For example, use the following in finetuning configuration
104
+ ```
105
+ init_from_pretrained_model: sr_ssl_flowmatching_16k_430m
106
+ ```
107
+ An example of a finetuning configuration can be found in [NeMo](https://github.com/NVIDIA/NeMo/blob/main/examples/audio/conf/flow_matching_generative_finetuning.yaml).
108
+
109
+ # Ethical Considerations
110
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
111
+
112
+ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).