metadata

language:
  - hi
license: apache-2.0
tags:
  - automatic-speech-recognition
  - hf-asr-leaderboard
  - robust-speech-event
datasets:
  - mozilla-foundation/common_voice_7_0
metrics:
  - wer
model-index:
  - name: wav2vec2-large-xls-r-300m-hi-wx1
    results:
      - task:
          type: automatic-speech-recognition
          name: Speech Recognition
        dataset:
          type: mozilla-foundation/common_voice_7_0
          name: Common Voice 7
          args: hi
        metrics:
          - type: wer
            value: 37.19684845500431
            name: Test WER
          - name: Test CER
            type: cer
            value: 11.763235514672798

wav2vec2-large-xls-r-300m-hi-wx1

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA-FOUNDATION/COMMON_VOICE_7_0 -HI dataset. It achieves the following results on the evaluation set:

Loss: 0.6552
Wer: 0.3200

Evaluation Commands

To evaluate on mozilla-foundation/common_voice_8_0 with test split

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-hi-wx1 --dataset mozilla-foundation/common_voice_7_0 --config hi --split test --log_outputs

To evaluate on speech-recognition-community-v2/dev_data

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.00024
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1800
num_epochs: 50
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
12.2663	1.36	200	5.9245	1.0
4.1856	2.72	400	3.4968	1.0
3.3908	4.08	600	2.9970	1.0
1.5444	5.44	800	0.9071	0.6139
0.7237	6.8	1000	0.6508	0.4862
0.5323	8.16	1200	0.6217	0.4647
0.4426	9.52	1400	0.5785	0.4288
0.3933	10.88	1600	0.5935	0.4217
0.3532	12.24	1800	0.6358	0.4465
0.3319	13.6	2000	0.5789	0.4118
0.2877	14.96	2200	0.6163	0.4056
0.2663	16.33	2400	0.6176	0.3893
0.2511	17.68	2600	0.6065	0.3999
0.2275	19.05	2800	0.6183	0.3842
0.2098	20.41	3000	0.6486	0.3864
0.1943	21.77	3200	0.6365	0.3885
0.1877	23.13	3400	0.6013	0.3677
0.1679	24.49	3600	0.6451	0.3795
0.1667	25.85	3800	0.6410	0.3635
0.1514	27.21	4000	0.6000	0.3577
0.1453	28.57	4200	0.6020	0.3518
0.134	29.93	4400	0.6531	0.3517
0.1354	31.29	4600	0.6874	0.3578
0.1224	32.65	4800	0.6519	0.3492
0.1199	34.01	5000	0.6553	0.3490
0.1077	35.37	5200	0.6621	0.3429
0.0997	36.73	5400	0.6641	0.3413
0.0964	38.09	5600	0.6722	0.3385
0.0931	39.45	5800	0.6365	0.3363
0.0944	40.81	6000	0.6454	0.3326
0.0862	42.18	6200	0.6497	0.3256
0.0848	43.54	6400	0.6599	0.3226
0.0793	44.89	6600	0.6625	0.3232
0.076	46.26	6800	0.6463	0.3186
0.0749	47.62	7000	0.6559	0.3225
0.0663	48.98	7200	0.6552	0.3200

Framework versions

Transformers 4.16.2
Pytorch 1.10.0+cu111
Datasets 1.18.3
Tokenizers 0.11.0