upload model

Browse files

Files changed (8) hide show

.gitattributes +3 -0
README.md +122 -0
classifier.ckpt +3 -0
embedding_model.ckpt +3 -0
hyperparams.yaml +60 -0
label_encoder.txt +14 -0
normalizer.ckpt +3 -0
yes.wav +0 -0

.gitattributes CHANGED Viewed

@@ -14,3 +14,6 @@
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text

 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
+classifier.ckpt filter=lfs diff=lfs merge=lfs -text
+embedding_model.ckpt filter=lfs diff=lfs merge=lfs -text
+normalizer.ckpt filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,122 @@

+---
+language: "en"
+thumbnail:
+tags:
+- embeddings
+- Speaker
+- Verification
+- Identification
+- pytorch
+- xvectors
+- TDNN
+license: "apache-2.0"
+datasets:
+- voxceleb
+metrics:
+- EER
+- min_dct
+---
+<iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
+<br/><br/>
+# Speaker Verification with xvector embeddings on Voxceleb
+This repository provides all the necessary tools to extract speaker embeddings with a pretrained TDNN model using SpeechBrain.
+The system is trained on Voxceleb 1+ Voxceleb2 training data.
+For a better experience, we encourage you to learn more about
+[SpeechBrain](https://speechbrain.github.io). The given model performance on Voxceleb1-test set (Cleaned) is:
+| Release | EER(%)
+|:-------------:|:--------------:|
+| 05-03-21 | 3.2 |
+## Pipeline description
+This system is composed of a TDNN model coupled with statistical pooling. The system is trained with Categorical Cross-Entropy Loss.
+## Install SpeechBrain
+First of all, please install SpeechBrain with the following command:
+```
+pip install speechbrain
+```
+Please notice that we encourage you to read our tutorials and learn more about
+[SpeechBrain](https://speechbrain.github.io).
+### Compute your speaker embeddings
+```python
+import torchaudio
+from speechbrain.pretrained import EncoderClassifier
+classifier = EncoderClassifier.from_hparams(source="speechbrain/spkrec-xvect-voxceleb", savedir="pretrained_models/spkrec-xvect-voxceleb")
+signal, fs =torchaudio.load('samples/audio_samples/example1.wav')
+embeddings = classifier.encode_batch(signal)
+```
+### Inference on GPU
+To perform inference on the GPU, add  `run_opts={"device":"cuda"}`  when calling the `from_hparams` method.
+### Training
+The model was trained with SpeechBrain (aa018540).
+To train it from scratch follows these steps:
+1. Clone SpeechBrain:
+```bash
+git clone https://github.com/speechbrain/speechbrain/
+```
+2. Install it:
+```
+cd speechbrain
+pip install -r requirements.txt
+pip install -e .
+```
+3. Run Training:
+```
+cd  recipes/VoxCeleb/SpeakerRec/
+python train_speaker_embeddings.py hparams/train_x_vectors.yaml --data_folder=your_data_folder
+```
+You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1RtCBJ3O8iOCkFrJItCKT9oL-Q1MNCwMH?usp=sharing).
+### Limitations
+The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
+#### Referencing xvectors
+```@inproceedings{DBLP:conf/odyssey/SnyderGMSPK18,
+  author    = {David Snyder and
+               Daniel Garcia{-}Romero and
+               Alan McCree and
+               Gregory Sell and
+               Daniel Povey and
+               Sanjeev Khudanpur},
+  title     = {Spoken Language Recognition using X-vectors},
+  booktitle = {Odyssey 2018},
+  pages     = {105--111},
+  year      = {2018},
+}
+```
+#### Referencing SpeechBrain
+```
+@misc{SB2021,
+    author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
+    title = {SpeechBrain},
+    year = {2021},
+    publisher = {GitHub},
+    journal = {GitHub repository},
+    howpublished = {\url{https://github.com/speechbrain/speechbrain}},
+  }
+```
+#### About SpeechBrain
+SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains.
+Website: https://speechbrain.github.io/
+GitHub: https://github.com/speechbrain/speechbrain

classifier.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9cfda9e191640d9e7c67f9c5a2dd60d3beddb917bbc0a70c0e5a8d8d32b6b14b
+size 1096813

embedding_model.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:35736f2cc0e6d7562ef2f4c805ee58a91308871bbe76e0cc3efd1c443e114e89
+size 16887676

hyperparams.yaml ADDED Viewed

	@@ -0,0 +1,60 @@

+# ############################################################################
+# Model: xvector for Command Recognition with Google Speech Commands
+# ############################################################################
+# Pretrain folder (HuggingFace)
+pretrained_path: speechbrain/google_speech_command_xvector
+# Feature parameters
+n_mels: 24
+# Output parameters
+out_n_neurons: 12 # 12 command version
+# Model params
+compute_features: !new:speechbrain.lobes.features.Fbank
+    n_mels: !ref <n_mels>
+mean_var_norm: !new:speechbrain.processing.features.InputNormalization
+    norm_type: sentence
+    std_norm: False
+embedding_model: !new:speechbrain.lobes.models.Xvector.Xvector
+    in_channels: !ref <n_mels>
+    activation: !name:torch.nn.LeakyReLU
+    tdnn_blocks: 5
+    tdnn_channels: [512, 512, 512, 512, 1500]
+    tdnn_kernel_sizes: [5, 3, 3, 1, 1]
+    tdnn_dilations: [1, 2, 3, 1, 1]
+    lin_neurons: 512
+classifier: !new:speechbrain.lobes.models.Xvector.Classifier
+    input_shape: [null, null, 512]
+    activation: !name:torch.nn.LeakyReLU
+    lin_blocks: 1
+    lin_neurons: 512
+    out_neurons: !ref <out_n_neurons>
+mean_var_norm_emb: !new:speechbrain.processing.features.InputNormalization
+    norm_type: global
+    std_norm: False
+modules:
+    compute_features: !ref <compute_features>
+    mean_var_norm: !ref <mean_var_norm>
+    embedding_model: !ref <embedding_model>
+    classifier: !ref <classifier>
+label_encoder: !new:speechbrain.dataio.encoder.CategoricalEncoder
+pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
+    loadables:
+        embedding_model: !ref <embedding_model>
+        classifier: !ref <classifier>
+        label_encoder: !ref <label_encoder>
+    paths:
+        embedding_model: !ref <pretrained_path>/embedding_model.ckpt
+        classifier: !ref <pretrained_path>/classifier.ckpt
+        label_encoder: !ref <pretrained_path>/label_encoder.txt

label_encoder.txt ADDED Viewed

	@@ -0,0 +1,14 @@

+'yes' => 0
+'no' => 1
+'up' => 2
+'down' => 3
+'left' => 4
+'right' => 5
+'on' => 6
+'off' => 7
+'stop' => 8
+'go' => 9
+'unknown' => 10
+'silence' => 11
+================
+'starting_index' => 0

normalizer.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fcfa59116360f35b6ac5c46242d7ed3a9719892e5e8d314a6f9ae9281c3a2295
+size 1153

yes.wav ADDED Viewed

Binary file (23.7 kB). View file