Introduction

This repo contains pre-trained model using https://github.com/k2-fsa/icefall/pull/248.

It is trained on full LibriSpeech dataset using pruned RNN-T loss from k2.

How to clone this repo

sudo apt-get install git-lfs
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless-2022-03-12

cd icefall-asr-librispeech-pruned-transducer-stateless-2022-03-12
git lfs pull

Caution: You have to run git lfs pull. Otherwise, you will be SAD later.

The model in this repo is trained using the commit 1603744469d167d848e074f2ea98c587153205fa.

You can use

git clone https://github.com/k2-fsa/icefall
cd icefall
git checkout 1603744469d167d848e074f2ea98c587153205fa

to download icefall.

The decoder architecture is modified from Rnn-Transducer with Stateless Prediction Network. A Conv1d layer is placed right after the input embedding layer.


Description

This repo provides pre-trained transducer Conformer model for the LibriSpeech dataset using icefall. There are no RNNs in the decoder. The decoder is stateless and contains only an embedding layer and a Conv1d.

The commands for training are:

cd egs/librispeech/ASR/
./prepare.sh

export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"

. path.sh

./pruned_transducer_stateless/train.py \
  --world-size 8 \
  --num-epochs 60 \
  --start-epoch 0 \
  --exp-dir pruned_transducer_stateless/exp \
  --full-libri 1 \
  --max-duration 300 \
  --prune-range 5 \
  --lr-factor 5 \
  --lm-scale 0.25

The tensorboard training log can be found at https://tensorboard.dev/experiment/WKRFY5fYSzaVBHahenpNlA/

The command for decoding is:

epoch=42
avg=11
sym=1

# greedy search

./pruned_transducer_stateless/decode.py \
  --epoch $epoch \
  --avg $avg \
  --exp-dir ./pruned_transducer_stateless/exp \
  --max-duration 100 \
  --decoding-method greedy_search \
  --beam-size 4 \
  --max-sym-per-frame $sym

# modified beam search
./pruned_transducer_stateless/decode.py \
  --epoch $epoch \
  --avg $avg \
  --exp-dir ./pruned_transducer_stateless/exp \
  --max-duration 100 \
  --decoding-method modified_beam_search \
  --beam-size 4

# beam search
# (not recommended)
./pruned_transducer_stateless/decode.py \
  --epoch $epoch \
  --avg $avg \
  --exp-dir ./pruned_transducer_stateless/exp \
  --max-duration 100 \
  --decoding-method beam_search \
  --beam-size 4

You can find the decoding log for the above command in this repo (in the folder log).

The WERs for the test datasets are

test-clean test-other comment
greedy search (max sym per frame 1) 2.62 6.37 --epoch 42, --avg 11, --max-duration 100
greedy search (max sym per frame 2) 2.62 6.37 --epoch 42, --avg 11, --max-duration 100
greedy search (max sym per frame 3) 2.62 6.37 --epoch 42, --avg 11, --max-duration 100
modified beam search (beam size 4) 2.56 6.27 --epoch 42, --avg 11, --max-duration 100
beam search (beam size 4) 2.57 6.27 --epoch 42, --avg 11, --max-duration 100

File description

  • log, this directory contains the decoding log and decoding results
  • test_wavs, this directory contains wave files for testing the pre-trained model
  • data, this directory contains files generated by prepare.sh
  • exp, this directory contains only one file: preprained.pt

exp/pretrained.pt is generated by the following command:

epoch=42
avg=11

./pruned_transducer_stateless/export.py \
  --exp-dir ./pruned_transducer_stateless/exp \
  --bpe-model data/lang_bpe_500/bpe.model \
  --epoch $epoch \
  --avg $avg

HINT: To use pretrained.pt to compute the WER for test-clean and test-other, just do the following:

cp icefall-asr-librispeech-pruned-transducer-stateless-2022-03-12/exp/pretrained.pt \
  /path/to/icefall/egs/librispeech/ASR/pruned_transducer_stateless/exp/epoch-999.pt

and pass --epoch 999 --avg 1 to pruned_transducer_stateless/decode.py.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.