appledora commited on
Commit
bcd144a
1 Parent(s): b88c663

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -32
README.md CHANGED
@@ -41,40 +41,10 @@ if not os.path.exists("<RESAMPLED AUDIO FILE PATH>"):
41
  ```
42
  ## Training
43
  We used the official [NeMo documentation on training an ASR model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/examples/kinyarwanda_asr.html) to prepare our transcript manifest and train our model. However, we did not train any custom tokenizer and instead downloaded the tokenizer from [banglaBERT-large](https://huggingface.co/csebuetnlp/banglabert_large/) for better vocabulary coverage. For validation, we have used `29589` samples separated from the training data and processed accordingly. The final validation score was `22.4% WER` , at epoch `164`.
44
- Final Training script:
45
- ```bash
46
- export TRAIN_MANIFEST_PATH="<TRAINING MANIFEST JSON>"
47
- export DEV_MANIFEST_PATH="<VALIDATION MANIFEST JSON>"
48
- export TOKENIZER_PATH="<TOKENIZER FOLDER>"
49
- export HYDRA_FULL_ERROR=1
50
- python [NEMO_GIT_FOLDER]/examples/asr/asr_ctc/speech_to_text_ctc_bpe.py --config-path=[NEMO_GIT_FOLDER]/examples/asr/conf/conformer/ --config-name=conformer_ctc_bpe \
51
- model.train_ds.manifest_filepath=${TRAIN_MANIFEST_PATH} \
52
- model.validation_ds.manifest_filepath=${DEV_MANIFEST_PATH} \
53
- model.tokenizer.dir=${TOKENIZER_PATH} \
54
- model.tokenizer.type=wpe \
55
- trainer.devices=4 \
56
- trainer.accelerator="gpu" \
57
- trainer.strategy="ddp" \
58
- trainer.max_epochs=1000 \
59
- model.optim.name="adamw" \
60
- model.optim.lr=0.001 \
61
- model.optim.betas=[0.9,0.999] \
62
- model.optim.weight_decay=0.0001 \
63
- model.optim.sched.warmup_steps=2000 \
64
- exp_manager.exp_dir=results/ \
65
- exp_manager.create_wandb_logger=False \
66
- exp_manager.resume_if_exists=true
67
- ```
68
  ## Evaluation
69
  `14,016` test samples have been used to evaluate the dataset. The generated output file contains both ground truth and predicted strings. The final result is the Word Error Rate (WER) and Character Error Rate (CER) for the model.
70
- ```bash
71
- export HYDRA_FULL_ERROR=1
72
- python3 [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py \
73
- model_path="<PRETRAINED MODEL PATH>" \
74
- dataset_manifest="<TEST MANIFEST JSON>" \
75
- output_filename=test_with_predictions.json \
76
- batch_size=1
77
- ```
78
 
79
  **Test Dataset WER/CER 69.25%/42.13%**
80
  ## Inference
 
41
  ```
42
  ## Training
43
  We used the official [NeMo documentation on training an ASR model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/examples/kinyarwanda_asr.html) to prepare our transcript manifest and train our model. However, we did not train any custom tokenizer and instead downloaded the tokenizer from [banglaBERT-large](https://huggingface.co/csebuetnlp/banglabert_large/) for better vocabulary coverage. For validation, we have used `29589` samples separated from the training data and processed accordingly. The final validation score was `22.4% WER` , at epoch `164`.
44
+ Training script : [training.sh](training.sh)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  ## Evaluation
46
  `14,016` test samples have been used to evaluate the dataset. The generated output file contains both ground truth and predicted strings. The final result is the Word Error Rate (WER) and Character Error Rate (CER) for the model.
47
+ Evaluation script: [evaluation.sh](evaluation.sh)
 
 
 
 
 
 
 
48
 
49
  **Test Dataset WER/CER 69.25%/42.13%**
50
  ## Inference