Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,57 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: "ru"
|
3 |
+
thumbnail:
|
4 |
+
tags:
|
5 |
+
- automatic-speech-recognition
|
6 |
+
- CTC
|
7 |
+
- Attention
|
8 |
+
- pytorch
|
9 |
+
- speechbrain
|
10 |
+
license: "apache-2.0"
|
11 |
+
datasets:
|
12 |
+
- buriy-audiobooks-2-val
|
13 |
+
metrics:
|
14 |
+
- wer
|
15 |
+
- cer
|
16 |
+
---
|
17 |
+
|
18 |
+
| Release | Test WER | GPUs |
|
19 |
+
|:-------------:|:--------------:| :--------:|
|
20 |
+
| 22-05-11 | - | 1xK80 24GB |
|
21 |
+
|
22 |
+
## Pipeline description
|
23 |
+
(by Speech brain text)
|
24 |
+
|
25 |
+
This ASR system is composed with 3 different but linked blocks:
|
26 |
+
- Tokenizer (unigram) that transforms words into subword units and trained with
|
27 |
+
the train transcriptions of LibriSpeech.
|
28 |
+
- Neural language model (RNNLM) trained on the full (380K) words dataset.
|
29 |
+
- Acoustic model (CRDNN + CTC/Attention). The CRDNN architecture is made of
|
30 |
+
N blocks of convolutional neural networks with normalisation and pooling on the
|
31 |
+
frequency domain. Then, a bidirectional LSTM is connected to a final DNN to obtain
|
32 |
+
the final acoustic representation that is given to the CTC and attention decoders.
|
33 |
+
|
34 |
+
The system is trained with recordings sampled at 16kHz (single channel).
|
35 |
+
The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *transcribe_file* if needed.
|
36 |
+
|
37 |
+
## Install SpeechBrain
|
38 |
+
|
39 |
+
First of all, please install SpeechBrain with the following command:
|
40 |
+
|
41 |
+
```
|
42 |
+
pip install speechbrain
|
43 |
+
```
|
44 |
+
|
45 |
+
Please notice that SpeechBrain encourage you to read tutorials and learn more about
|
46 |
+
[SpeechBrain](https://speechbrain.github.io).
|
47 |
+
|
48 |
+
### Transcribing your own audio files (in Russian)
|
49 |
+
|
50 |
+
```python
|
51 |
+
from speechbrain.pretrained import EncoderDecoderASR
|
52 |
+
asr_model = EncoderDecoderASR.from_hparams(source="AndyGo/speech-brain-asr-crdnn-rnnlm-buriy-audiobooks-2-val", savedir="pretrained_models/speech-brain-asr-crdnn-rnnlm-buriy-audiobooks-2-val")
|
53 |
+
asr_model.transcribe_file('speech-brain-asr-crdnn-rnnlm-buriy-audiobooks-2-val/example.wav')
|
54 |
+
```
|
55 |
+
|
56 |
+
### Inference on GPU
|
57 |
+
To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
|