speechbrainteam
commited on
Commit
•
4ceee33
1
Parent(s):
a6ead3b
Update README.md
Browse files
README.md
CHANGED
@@ -25,7 +25,7 @@ recognition from an end-to-end system pretrained on CommonVoice (French Language
|
|
25 |
SpeechBrain. For a better experience, we encourage you to learn more about
|
26 |
[SpeechBrain](https://speechbrain.github.io).
|
27 |
|
28 |
-
The performance of the model is the following
|
29 |
|
30 |
| Release | Test CER | Test WER | GPUs |
|
31 |
|:-------------:|:--------------:|:--------------:| :--------:|
|
@@ -34,9 +34,9 @@ The performance of the model is the following.
|
|
34 |
## Pipeline description
|
35 |
|
36 |
This ASR system is composed of 2 different but linked blocks:
|
37 |
-
1
|
38 |
the train transcriptions (train.tsv) of CommonVoice (FR).
|
39 |
-
2
|
40 |
N blocks of convolutional neural networks with normalization and pooling on the
|
41 |
frequency domain. Then, a bidirectional LSTM is connected to a final DNN to obtain
|
42 |
the final acoustic representation that is given to the CTC and attention decoders.
|
@@ -78,7 +78,7 @@ The SpeechBrain team does not provide any warranty on the performance achieved b
|
|
78 |
year = {2021},
|
79 |
publisher = {GitHub},
|
80 |
journal = {GitHub repository},
|
81 |
-
howpublished = {
|
82 |
}
|
83 |
```
|
84 |
|
|
|
25 |
SpeechBrain. For a better experience, we encourage you to learn more about
|
26 |
[SpeechBrain](https://speechbrain.github.io).
|
27 |
|
28 |
+
The performance of the model is the following:
|
29 |
|
30 |
| Release | Test CER | Test WER | GPUs |
|
31 |
|:-------------:|:--------------:|:--------------:| :--------:|
|
|
|
34 |
## Pipeline description
|
35 |
|
36 |
This ASR system is composed of 2 different but linked blocks:
|
37 |
+
1-Tokenizer (unigram) that transforms words into subword units and trained with
|
38 |
the train transcriptions (train.tsv) of CommonVoice (FR).
|
39 |
+
2- Acoustic model (CRDNN + CTC/Attention). The CRDNN architecture is made of
|
40 |
N blocks of convolutional neural networks with normalization and pooling on the
|
41 |
frequency domain. Then, a bidirectional LSTM is connected to a final DNN to obtain
|
42 |
the final acoustic representation that is given to the CTC and attention decoders.
|
|
|
78 |
year = {2021},
|
79 |
publisher = {GitHub},
|
80 |
journal = {GitHub repository},
|
81 |
+
howpublished = {\url{https://github.com/speechbrain/speechbrain}},
|
82 |
}
|
83 |
```
|
84 |
|