robinhad's picture
Update to model with WER 12.22%
5617a47
metadata
language:
  - uk
license: mit
tags:
  - automatic-speech-recognition
  - common_voice
  - generated_from_trainer
datasets:
  - common_voice
model-index:
  - name: wav2vec2-xls-r-300m-uk
    results:
      - task:
          name: Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice uk
          type: common_voice
          args: uk
        metrics:
          - name: Test WER
            type: wer
            value: 12.22

wav2vec2-xls-r-300m-uk

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0927
  • Wer: 0.1222
  • Cer: 0.0204

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 40
  • eval_batch_size: 40
  • seed: 42
  • gradient_accumulation_steps: 6
  • total_train_batch_size: 240
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 100
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Cer Validation Loss Wer
9.0008 1.68 200 1.0 3.7590 1.0
3.4972 3.36 400 1.0 3.3933 1.0
3.3432 5.04 600 1.0 3.2617 1.0
3.2421 6.72 800 1.0 3.0712 1.0
1.9839 7.68 1000 0.1400 0.7204 0.6561
0.8017 9.36 1200 0.0766 0.3734 0.4159
0.5554 11.04 1400 0.0583 0.2621 0.3237
0.4309 12.68 1600 0.0486 0.2085 0.2753
0.3697 14.36 1800 0.0421 0.1746 0.2427
0.3293 16.04 2000 0.0388 0.1597 0.2243
0.2934 17.72 2200 0.0358 0.1428 0.2083
0.2704 19.4 2400 0.0333 0.1326 0.1949
0.2547 21.08 2600 0.0322 0.1255 0.1882
0.2366 22.76 2800 0.0309 0.1211 0.1815
0.2183 24.44 3000 0.0294 0.1159 0.1727
0.2115 26.13 3200 0.0280 0.1117 0.1661
0.1968 27.8 3400 0.0274 0.1063 0.1622
0.1922 29.48 3600 0.0269 0.1082 0.1598
0.1847 31.17 3800 0.0260 0.1061 0.1550
0.1715 32.84 4000 0.0252 0.1014 0.1496
0.1689 34.53 4200 0.0250 0.1012 0.1492
0.1655 36.21 4400 0.0243 0.0999 0.1450
0.1585 37.88 4600 0.0239 0.0967 0.1432
0.1492 39.57 4800 0.0237 0.0978 0.1421
0.1491 41.25 5000 0.0236 0.0963 0.1412
0.1453 42.93 5200 0.0230 0.0979 0.1373
0.1386 44.61 5400 0.0227 0.0959 0.1353
0.1387 46.29 5600 0.0226 0.0927 0.1355
0.1329 47.97 5800 0.0224 0.0951 0.1341
0.1295 49.65 6000 0.0219 0.0950 0.1306
0.1287 51.33 6200 0.0216 0.0937 0.1290
0.1277 53.02 6400 0.0215 0.0963 0.1294
0.1201 54.69 6600 0.0213 0.0959 0.1282
0.1199 56.38 6800 0.0215 0.0944 0.1286
0.1221 58.06 7000 0.0209 0.0938 0.1249
0.1145 59.68 7200 0.0208 0.0941 0.1254
0.1143 61.36 7400 0.0209 0.0941 0.1249
0.1143 63.04 7600 0.0209 0.0940 0.1248
0.1137 64.72 7800 0.0205 0.0931 0.1234
0.1125 66.4 8000 0.0204 0.0927 0.1222

Framework versions

  • Transformers 4.25.1
  • Pytorch 1.13.1+cu117
  • Datasets 2.8.0
  • Tokenizers 0.13.2