Mirco commited on
Commit
14e4238
1 Parent(s): a17ddbe

upload model

Browse files
Files changed (8) hide show
  1. .gitattributes +3 -0
  2. README.md +122 -0
  3. classifier.ckpt +3 -0
  4. embedding_model.ckpt +3 -0
  5. hyperparams.yaml +60 -0
  6. label_encoder.txt +14 -0
  7. normalizer.ckpt +3 -0
  8. yes.wav +0 -0
.gitattributes CHANGED
@@ -14,3 +14,6 @@
14
  *.pb filter=lfs diff=lfs merge=lfs -text
15
  *.pt filter=lfs diff=lfs merge=lfs -text
16
  *.pth filter=lfs diff=lfs merge=lfs -text
 
 
 
 
14
  *.pb filter=lfs diff=lfs merge=lfs -text
15
  *.pt filter=lfs diff=lfs merge=lfs -text
16
  *.pth filter=lfs diff=lfs merge=lfs -text
17
+ classifier.ckpt filter=lfs diff=lfs merge=lfs -text
18
+ embedding_model.ckpt filter=lfs diff=lfs merge=lfs -text
19
+ normalizer.ckpt filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,122 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: "en"
3
+ thumbnail:
4
+ tags:
5
+ - embeddings
6
+ - Speaker
7
+ - Verification
8
+ - Identification
9
+ - pytorch
10
+ - xvectors
11
+ - TDNN
12
+ license: "apache-2.0"
13
+ datasets:
14
+ - voxceleb
15
+ metrics:
16
+ - EER
17
+ - min_dct
18
+ ---
19
+
20
+ <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
21
+ <br/><br/>
22
+
23
+ # Speaker Verification with xvector embeddings on Voxceleb
24
+
25
+ This repository provides all the necessary tools to extract speaker embeddings with a pretrained TDNN model using SpeechBrain.
26
+ The system is trained on Voxceleb 1+ Voxceleb2 training data.
27
+
28
+ For a better experience, we encourage you to learn more about
29
+ [SpeechBrain](https://speechbrain.github.io). The given model performance on Voxceleb1-test set (Cleaned) is:
30
+
31
+ | Release | EER(%)
32
+ |:-------------:|:--------------:|
33
+ | 05-03-21 | 3.2 |
34
+
35
+
36
+ ## Pipeline description
37
+ This system is composed of a TDNN model coupled with statistical pooling. The system is trained with Categorical Cross-Entropy Loss.
38
+
39
+ ## Install SpeechBrain
40
+
41
+ First of all, please install SpeechBrain with the following command:
42
+
43
+ ```
44
+ pip install speechbrain
45
+ ```
46
+
47
+ Please notice that we encourage you to read our tutorials and learn more about
48
+ [SpeechBrain](https://speechbrain.github.io).
49
+
50
+ ### Compute your speaker embeddings
51
+
52
+ ```python
53
+ import torchaudio
54
+ from speechbrain.pretrained import EncoderClassifier
55
+ classifier = EncoderClassifier.from_hparams(source="speechbrain/spkrec-xvect-voxceleb", savedir="pretrained_models/spkrec-xvect-voxceleb")
56
+ signal, fs =torchaudio.load('samples/audio_samples/example1.wav')
57
+ embeddings = classifier.encode_batch(signal)
58
+ ```
59
+
60
+ ### Inference on GPU
61
+ To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
62
+
63
+ ### Training
64
+ The model was trained with SpeechBrain (aa018540).
65
+ To train it from scratch follows these steps:
66
+ 1. Clone SpeechBrain:
67
+ ```bash
68
+ git clone https://github.com/speechbrain/speechbrain/
69
+ ```
70
+ 2. Install it:
71
+ ```
72
+ cd speechbrain
73
+ pip install -r requirements.txt
74
+ pip install -e .
75
+ ```
76
+
77
+ 3. Run Training:
78
+ ```
79
+ cd recipes/VoxCeleb/SpeakerRec/
80
+ python train_speaker_embeddings.py hparams/train_x_vectors.yaml --data_folder=your_data_folder
81
+ ```
82
+
83
+ You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1RtCBJ3O8iOCkFrJItCKT9oL-Q1MNCwMH?usp=sharing).
84
+
85
+ ### Limitations
86
+ The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
87
+
88
+ #### Referencing xvectors
89
+ ```@inproceedings{DBLP:conf/odyssey/SnyderGMSPK18,
90
+ author = {David Snyder and
91
+ Daniel Garcia{-}Romero and
92
+ Alan McCree and
93
+ Gregory Sell and
94
+ Daniel Povey and
95
+ Sanjeev Khudanpur},
96
+ title = {Spoken Language Recognition using X-vectors},
97
+ booktitle = {Odyssey 2018},
98
+ pages = {105--111},
99
+ year = {2018},
100
+ }
101
+ ```
102
+
103
+
104
+ #### Referencing SpeechBrain
105
+
106
+ ```
107
+ @misc{SB2021,
108
+ author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
109
+ title = {SpeechBrain},
110
+ year = {2021},
111
+ publisher = {GitHub},
112
+ journal = {GitHub repository},
113
+ howpublished = {\url{https://github.com/speechbrain/speechbrain}},
114
+ }
115
+ ```
116
+
117
+ #### About SpeechBrain
118
+ SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains.
119
+
120
+ Website: https://speechbrain.github.io/
121
+
122
+ GitHub: https://github.com/speechbrain/speechbrain
classifier.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9cfda9e191640d9e7c67f9c5a2dd60d3beddb917bbc0a70c0e5a8d8d32b6b14b
3
+ size 1096813
embedding_model.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:35736f2cc0e6d7562ef2f4c805ee58a91308871bbe76e0cc3efd1c443e114e89
3
+ size 16887676
hyperparams.yaml ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ############################################################################
2
+ # Model: xvector for Command Recognition with Google Speech Commands
3
+ # ############################################################################
4
+
5
+ # Pretrain folder (HuggingFace)
6
+ pretrained_path: speechbrain/google_speech_command_xvector
7
+
8
+ # Feature parameters
9
+ n_mels: 24
10
+
11
+ # Output parameters
12
+ out_n_neurons: 12 # 12 command version
13
+
14
+
15
+ # Model params
16
+ compute_features: !new:speechbrain.lobes.features.Fbank
17
+ n_mels: !ref <n_mels>
18
+
19
+ mean_var_norm: !new:speechbrain.processing.features.InputNormalization
20
+ norm_type: sentence
21
+ std_norm: False
22
+
23
+ embedding_model: !new:speechbrain.lobes.models.Xvector.Xvector
24
+ in_channels: !ref <n_mels>
25
+ activation: !name:torch.nn.LeakyReLU
26
+ tdnn_blocks: 5
27
+ tdnn_channels: [512, 512, 512, 512, 1500]
28
+ tdnn_kernel_sizes: [5, 3, 3, 1, 1]
29
+ tdnn_dilations: [1, 2, 3, 1, 1]
30
+ lin_neurons: 512
31
+
32
+ classifier: !new:speechbrain.lobes.models.Xvector.Classifier
33
+ input_shape: [null, null, 512]
34
+ activation: !name:torch.nn.LeakyReLU
35
+ lin_blocks: 1
36
+ lin_neurons: 512
37
+ out_neurons: !ref <out_n_neurons>
38
+
39
+ mean_var_norm_emb: !new:speechbrain.processing.features.InputNormalization
40
+ norm_type: global
41
+ std_norm: False
42
+
43
+ modules:
44
+ compute_features: !ref <compute_features>
45
+ mean_var_norm: !ref <mean_var_norm>
46
+ embedding_model: !ref <embedding_model>
47
+ classifier: !ref <classifier>
48
+
49
+ label_encoder: !new:speechbrain.dataio.encoder.CategoricalEncoder
50
+
51
+
52
+ pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
53
+ loadables:
54
+ embedding_model: !ref <embedding_model>
55
+ classifier: !ref <classifier>
56
+ label_encoder: !ref <label_encoder>
57
+ paths:
58
+ embedding_model: !ref <pretrained_path>/embedding_model.ckpt
59
+ classifier: !ref <pretrained_path>/classifier.ckpt
60
+ label_encoder: !ref <pretrained_path>/label_encoder.txt
label_encoder.txt ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 'yes' => 0
2
+ 'no' => 1
3
+ 'up' => 2
4
+ 'down' => 3
5
+ 'left' => 4
6
+ 'right' => 5
7
+ 'on' => 6
8
+ 'off' => 7
9
+ 'stop' => 8
10
+ 'go' => 9
11
+ 'unknown' => 10
12
+ 'silence' => 11
13
+ ================
14
+ 'starting_index' => 0
normalizer.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fcfa59116360f35b6ac5c46242d7ed3a9719892e5e8d314a6f9ae9281c3a2295
3
+ size 1153
yes.wav ADDED
Binary file (23.7 kB). View file