bengaliAI
/

BanglaConformer

Automatic Speech Recognition

NeMo

Bengali

Model card Files Files and versions Community

appledora commited on Jul 7, 2023

Commit

bcd144a

•

1 Parent(s): b88c663

Update README.md

Browse files

Files changed (1) hide show

README.md +2 -32

README.md CHANGED Viewed

@@ -41,40 +41,10 @@ if  not os.path.exists("<RESAMPLED AUDIO FILE PATH>"):
 ```
 ## Training
 We used the official [NeMo documentation on training an ASR model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/examples/kinyarwanda_asr.html) to prepare our transcript manifest and train our model. However, we did not train any custom tokenizer and instead downloaded the tokenizer from [banglaBERT-large](https://huggingface.co/csebuetnlp/banglabert_large/) for better vocabulary coverage.  For validation, we have used `29589` samples separated from the training data and processed accordingly. The final  validation score was `22.4% WER` , at epoch `164`.
-Final Training script:
-```bash
-export TRAIN_MANIFEST_PATH="<TRAINING MANIFEST JSON>"
-export DEV_MANIFEST_PATH="<VALIDATION MANIFEST JSON>"
-export TOKENIZER_PATH="<TOKENIZER FOLDER>"
-export HYDRA_FULL_ERROR=1
-python [NEMO_GIT_FOLDER]/examples/asr/asr_ctc/speech_to_text_ctc_bpe.py --config-path=[NEMO_GIT_FOLDER]/examples/asr/conf/conformer/ --config-name=conformer_ctc_bpe \
-    model.train_ds.manifest_filepath=${TRAIN_MANIFEST_PATH} \
-    model.validation_ds.manifest_filepath=${DEV_MANIFEST_PATH} \
-    model.tokenizer.dir=${TOKENIZER_PATH} \
-    model.tokenizer.type=wpe \
-    trainer.devices=4 \
-    trainer.accelerator="gpu" \
-    trainer.strategy="ddp" \
-    trainer.max_epochs=1000 \
-    model.optim.name="adamw" \
-    model.optim.lr=0.001 \
-    model.optim.betas=[0.9,0.999] \
-    model.optim.weight_decay=0.0001 \
-    model.optim.sched.warmup_steps=2000 \
-    exp_manager.exp_dir=results/ \
-    exp_manager.create_wandb_logger=False \
-    exp_manager.resume_if_exists=true
-   ```
 ## Evaluation
 `14,016` test samples have been used to evaluate the dataset. The generated output file contains both ground truth and predicted strings. The final result is the Word Error Rate (WER) and Character Error Rate (CER) for the model.
-```bash
-export HYDRA_FULL_ERROR=1
-python3 [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py \
-model_path="<PRETRAINED MODEL PATH>" \
-dataset_manifest="<TEST MANIFEST JSON>" \
-output_filename=test_with_predictions.json \
-batch_size=1
-```
 **Test Dataset WER/CER 69.25%/42.13%**
 ## Inference

 ```
 ## Training
 We used the official [NeMo documentation on training an ASR model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/examples/kinyarwanda_asr.html) to prepare our transcript manifest and train our model. However, we did not train any custom tokenizer and instead downloaded the tokenizer from [banglaBERT-large](https://huggingface.co/csebuetnlp/banglabert_large/) for better vocabulary coverage.  For validation, we have used `29589` samples separated from the training data and processed accordingly. The final  validation score was `22.4% WER` , at epoch `164`.
+Training script : [training.sh](training.sh)
 ## Evaluation
 `14,016` test samples have been used to evaluate the dataset. The generated output file contains both ground truth and predicted strings. The final result is the Word Error Rate (WER) and Character Error Rate (CER) for the model.
+Evaluation script: [evaluation.sh](evaluation.sh)
 **Test Dataset WER/CER 69.25%/42.13%**
 ## Inference