dwgnr commited on
Commit
0262e52
1 Parent(s): 11881d1

update readme

Browse files
Files changed (1) hide show
  1. README.md +35 -28
README.md CHANGED
@@ -14,7 +14,8 @@ datasets:
14
  - switchboard
15
  metrics:
16
  - wer
17
- - ser
 
18
  ---
19
 
20
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
@@ -23,28 +24,26 @@ metrics:
23
  # Transformer for Switchboard (with Transformer LM)
24
 
25
  This repository provides all the necessary tools to perform automatic speech
26
- recognition from an end-to-end system pretrained on Switchboard within
27
- SpeechBrain. For a better experience, we encourage you to learn more about
28
- [SpeechBrain](https://speechbrain.github.io).
29
- The performance of the model is the following:
30
 
 
31
 
32
- | Release | Swbd SER | Callhome SER | Eval2000 SER | Swbd WER | Callhome WER | Eval2000 WER | GPUs |
33
- |:--------:|:--------:|:------------:|:------------:|:--------:|:------------:|:------------:|:-----------:|
34
- | 17-09-22 | 49.30 | 56.89 | 54.20 | 9.80 | 17.89 | 13.94 | 1xA100 40GB |
35
 
36
 
37
- ## Pipeline description
38
 
39
  This ASR system is composed of 3 different but linked blocks:
40
- - Tokenizer (unigram) that transforms words into subword units and trained with
41
- the train transcriptions of LibriSpeech.
42
- - Neural language model (Transformer LM) trained on the Switchboard training set and the Fisher corpus.
43
  - Acoustic model made of a transformer encoder and a joint decoder with CTC +
44
  transformer. Hence, the decoding also incorporates the CTC probabilities.
45
 
46
  The system is trained with recordings sampled at 16kHz (single channel).
47
- The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *transcribe_file* if needed.
48
 
49
  ## Install SpeechBrain
50
 
@@ -57,26 +56,32 @@ pip install speechbrain
57
  Please notice that we encourage you to read our tutorials and learn more about
58
  [SpeechBrain](https://speechbrain.github.io).
59
 
60
- ### Transcribing your own audio files (in English)
61
 
62
  ```python
63
  from speechbrain.pretrained import EncoderDecoderASR
64
  asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/asr-transformer-switchboard", savedir="pretrained_models/asr-transformer-switchboard")
65
  asr_model.transcribe_file("path/to/your/audiofile")
66
  ```
67
- ### Inference on GPU
 
 
68
  To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
69
 
70
  ## Parallel Inference on a Batch
 
71
  Please, [see this Colab notebook](https://colab.research.google.com/drive/1hX5ZI9S4jHIjahFCZnhwwQmFoGAi3tmu?usp=sharing) to figure out how to transcribe in parallel a batch of input sentences using a pre-trained model.
72
 
73
- ### Training
74
- The model was trained with SpeechBrain (Commit hash: '70904d0').
 
75
  To train it from scratch follow these steps:
 
76
  1. Clone SpeechBrain:
77
  ```bash
78
  git clone https://github.com/speechbrain/speechbrain/
79
  ```
 
80
  2. Install it:
81
  ```bash
82
  cd speechbrain
@@ -90,20 +95,26 @@ cd recipes/Switchboard/ASR/transformer
90
  python train.py hparams/transformer.yaml --data_folder=your_data_folder
91
  ```
92
 
93
- You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1ZudxqMWb8VNCJKvY2Ws5oNY3WI1To0I7?usp=sharing).
94
 
95
- ### Limitations
96
  The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
97
 
98
- # **About SpeechBrain**
 
 
 
 
 
 
 
 
99
  - Website: https://speechbrain.github.io/
100
- - Code: https://github.com/speechbrain/speechbrain/
101
  - HuggingFace: https://huggingface.co/speechbrain/
102
 
 
103
 
104
- # **Citing SpeechBrain**
105
- Please, cite SpeechBrain if you use it for your research or business.
106
-
107
 
108
  ```bibtex
109
  @misc{speechbrain,
@@ -116,7 +127,3 @@ Please, cite SpeechBrain if you use it for your research or business.
116
  note={arXiv:2106.04624}
117
  }
118
  ```
119
-
120
- #### Credits
121
-
122
- This model was trained with resources provided by the [KIZ](https://www.th-nuernberg.de/en/facilities/competence-centers/center-for-artificial-intelligence-kiz/) Cluster at TH Nürnberg.
 
14
  - switchboard
15
  metrics:
16
  - wer
17
+ - cer
18
+
19
  ---
20
 
21
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
 
24
  # Transformer for Switchboard (with Transformer LM)
25
 
26
  This repository provides all the necessary tools to perform automatic speech
27
+ recognition from an end-to-end system pretrained on Switchboard (EN) within SpeechBrain.
28
+ For a better experience, we encourage you to learn more about [SpeechBrain](https://speechbrain.github.io).
 
 
29
 
30
+ The performance of the model is the following:
31
 
32
+ | Release | Swbd WER | Callhome WER | Eval2000 WER | GPUs |
33
+ |:--------:|:--------:|:------------:|:------------:|:-----------:|
34
+ | 17-09-22 | 9.80 | 17.89 | 13.94 | 1xA100 40GB |
35
 
36
 
37
+ ## Pipeline Description
38
 
39
  This ASR system is composed of 3 different but linked blocks:
40
+ - Tokenizer (unigram) that transforms words into subword units trained on the Switchboard training transcriptions and the Fisher corpus.
41
+ - Neural language model (Transformer LM) trained on the Switchboard training transcriptions and the Fisher corpus.
 
42
  - Acoustic model made of a transformer encoder and a joint decoder with CTC +
43
  transformer. Hence, the decoding also incorporates the CTC probabilities.
44
 
45
  The system is trained with recordings sampled at 16kHz (single channel).
46
+ The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling `transcribe_file` if needed.
47
 
48
  ## Install SpeechBrain
49
 
 
56
  Please notice that we encourage you to read our tutorials and learn more about
57
  [SpeechBrain](https://speechbrain.github.io).
58
 
59
+ ## Transcribing Your Own Audio Files
60
 
61
  ```python
62
  from speechbrain.pretrained import EncoderDecoderASR
63
  asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/asr-transformer-switchboard", savedir="pretrained_models/asr-transformer-switchboard")
64
  asr_model.transcribe_file("path/to/your/audiofile")
65
  ```
66
+
67
+ ## Inference on GPU
68
+
69
  To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
70
 
71
  ## Parallel Inference on a Batch
72
+
73
  Please, [see this Colab notebook](https://colab.research.google.com/drive/1hX5ZI9S4jHIjahFCZnhwwQmFoGAi3tmu?usp=sharing) to figure out how to transcribe in parallel a batch of input sentences using a pre-trained model.
74
 
75
+ ## Training
76
+
77
+ The model was trained with SpeechBrain (commit hash: `70904d0`).
78
  To train it from scratch follow these steps:
79
+
80
  1. Clone SpeechBrain:
81
  ```bash
82
  git clone https://github.com/speechbrain/speechbrain/
83
  ```
84
+
85
  2. Install it:
86
  ```bash
87
  cd speechbrain
 
95
  python train.py hparams/transformer.yaml --data_folder=your_data_folder
96
  ```
97
 
98
+ ## Limitations
99
 
 
100
  The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
101
 
102
+ ## Credits
103
+
104
+ This model was trained with resources provided by the [THN Center for AI](https://www.th-nuernberg.de/en/kiz).
105
+
106
+ # About SpeechBrain
107
+
108
+ SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly.
109
+ Competitive or state-of-the-art performance is obtained in various domains.
110
+
111
  - Website: https://speechbrain.github.io/
112
+ - GitHub: https://github.com/speechbrain/speechbrain/
113
  - HuggingFace: https://huggingface.co/speechbrain/
114
 
115
+ # Citing SpeechBrain
116
 
117
+ Please cite SpeechBrain if you use it for your research or business.
 
 
118
 
119
  ```bibtex
120
  @misc{speechbrain,
 
127
  note={arXiv:2106.04624}
128
  }
129
  ```