imedennikov
commited on
Commit
•
dd1e3ba
1
Parent(s):
2928bdb
Update README.md
Browse files
README.md
CHANGED
@@ -121,7 +121,7 @@ pip install nemo_toolkit['asr']
|
|
121 |
|
122 |
## How to Use this Model
|
123 |
|
124 |
-
The model is available for use in the NeMo
|
125 |
|
126 |
### Automatically instantiate the model
|
127 |
|
@@ -164,7 +164,7 @@ TDT (Token-and-Duration Transducer) [2] is a generalization of conventional Tran
|
|
164 |
|
165 |
## Training
|
166 |
|
167 |
-
The NeMo
|
168 |
|
169 |
The model was trained for 300k steps with dynamic bucketing and a batch duration of 600s per GPU on 32 NVIDIA A100 80GB GPUs, and then finetuned for 100k additional steps on the modified training data (predicted texts for training samples with CER>10%).
|
170 |
|
@@ -204,7 +204,7 @@ Check out [Riva live demo](https://developer.nvidia.com/riva#demos).
|
|
204 |
|
205 |
[2] [Efficient Sequence Transduction by Jointly Predicting Tokens and Durations](https://arxiv.org/abs/2304.06795)
|
206 |
|
207 |
-
[3] [NVIDIA NeMo
|
208 |
|
209 |
[4] [Google SentencePiece Tokenizer](https://github.com/google/sentencepiece)
|
210 |
|
|
|
121 |
|
122 |
## How to Use this Model
|
123 |
|
124 |
+
The model is available for use in the NeMo Framework [3], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
|
125 |
|
126 |
### Automatically instantiate the model
|
127 |
|
|
|
164 |
|
165 |
## Training
|
166 |
|
167 |
+
The NeMo Framework [3] was used for training this model with this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_hybrid_transducer_ctc/speech_to_text_hybrid_rnnt_ctc_bpe.py) and this [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/fastconformer/hybrid_transducer_ctc/fastconformer_hybrid_transducer_ctc_bpe.yaml).
|
168 |
|
169 |
The model was trained for 300k steps with dynamic bucketing and a batch duration of 600s per GPU on 32 NVIDIA A100 80GB GPUs, and then finetuned for 100k additional steps on the modified training data (predicted texts for training samples with CER>10%).
|
170 |
|
|
|
204 |
|
205 |
[2] [Efficient Sequence Transduction by Jointly Predicting Tokens and Durations](https://arxiv.org/abs/2304.06795)
|
206 |
|
207 |
+
[3] [NVIDIA NeMo Framework](https://github.com/NVIDIA/NeMo)
|
208 |
|
209 |
[4] [Google SentencePiece Tokenizer](https://github.com/google/sentencepiece)
|
210 |
|