Update README.md
Browse files
README.md
CHANGED
@@ -41,40 +41,10 @@ if not os.path.exists("<RESAMPLED AUDIO FILE PATH>"):
|
|
41 |
```
|
42 |
## Training
|
43 |
We used the official [NeMo documentation on training an ASR model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/examples/kinyarwanda_asr.html) to prepare our transcript manifest and train our model. However, we did not train any custom tokenizer and instead downloaded the tokenizer from [banglaBERT-large](https://huggingface.co/csebuetnlp/banglabert_large/) for better vocabulary coverage. For validation, we have used `29589` samples separated from the training data and processed accordingly. The final validation score was `22.4% WER` , at epoch `164`.
|
44 |
-
|
45 |
-
```bash
|
46 |
-
export TRAIN_MANIFEST_PATH="<TRAINING MANIFEST JSON>"
|
47 |
-
export DEV_MANIFEST_PATH="<VALIDATION MANIFEST JSON>"
|
48 |
-
export TOKENIZER_PATH="<TOKENIZER FOLDER>"
|
49 |
-
export HYDRA_FULL_ERROR=1
|
50 |
-
python [NEMO_GIT_FOLDER]/examples/asr/asr_ctc/speech_to_text_ctc_bpe.py --config-path=[NEMO_GIT_FOLDER]/examples/asr/conf/conformer/ --config-name=conformer_ctc_bpe \
|
51 |
-
model.train_ds.manifest_filepath=${TRAIN_MANIFEST_PATH} \
|
52 |
-
model.validation_ds.manifest_filepath=${DEV_MANIFEST_PATH} \
|
53 |
-
model.tokenizer.dir=${TOKENIZER_PATH} \
|
54 |
-
model.tokenizer.type=wpe \
|
55 |
-
trainer.devices=4 \
|
56 |
-
trainer.accelerator="gpu" \
|
57 |
-
trainer.strategy="ddp" \
|
58 |
-
trainer.max_epochs=1000 \
|
59 |
-
model.optim.name="adamw" \
|
60 |
-
model.optim.lr=0.001 \
|
61 |
-
model.optim.betas=[0.9,0.999] \
|
62 |
-
model.optim.weight_decay=0.0001 \
|
63 |
-
model.optim.sched.warmup_steps=2000 \
|
64 |
-
exp_manager.exp_dir=results/ \
|
65 |
-
exp_manager.create_wandb_logger=False \
|
66 |
-
exp_manager.resume_if_exists=true
|
67 |
-
```
|
68 |
## Evaluation
|
69 |
`14,016` test samples have been used to evaluate the dataset. The generated output file contains both ground truth and predicted strings. The final result is the Word Error Rate (WER) and Character Error Rate (CER) for the model.
|
70 |
-
|
71 |
-
export HYDRA_FULL_ERROR=1
|
72 |
-
python3 [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py \
|
73 |
-
model_path="<PRETRAINED MODEL PATH>" \
|
74 |
-
dataset_manifest="<TEST MANIFEST JSON>" \
|
75 |
-
output_filename=test_with_predictions.json \
|
76 |
-
batch_size=1
|
77 |
-
```
|
78 |
|
79 |
**Test Dataset WER/CER 69.25%/42.13%**
|
80 |
## Inference
|
|
|
41 |
```
|
42 |
## Training
|
43 |
We used the official [NeMo documentation on training an ASR model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/examples/kinyarwanda_asr.html) to prepare our transcript manifest and train our model. However, we did not train any custom tokenizer and instead downloaded the tokenizer from [banglaBERT-large](https://huggingface.co/csebuetnlp/banglabert_large/) for better vocabulary coverage. For validation, we have used `29589` samples separated from the training data and processed accordingly. The final validation score was `22.4% WER` , at epoch `164`.
|
44 |
+
Training script : [training.sh](training.sh)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
## Evaluation
|
46 |
`14,016` test samples have been used to evaluate the dataset. The generated output file contains both ground truth and predicted strings. The final result is the Word Error Rate (WER) and Character Error Rate (CER) for the model.
|
47 |
+
Evaluation script: [evaluation.sh](evaluation.sh)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
|
49 |
**Test Dataset WER/CER 69.25%/42.13%**
|
50 |
## Inference
|