greenw0lf's picture
Update README.md
cd63e86
metadata
license: apache-2.0
tags:
  - generated_from_trainer
datasets:
  - common_voice_8_0
metrics:
  - wer
model-index:
  - name: wav2vec2-large-xls-r-1b-frisian-cv-8
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: common_voice_8_0
          type: common_voice_8_0
          config: fy-NL
          split: validation
          args: fy-NL
        metrics:
          - name: Wer
            type: wer
            value: 0.14290815597771747
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: common_voice_8_0
          type: common_voice_8_0
          config: fy-NL
          split: test
          args: fy-NL
        metrics:
          - name: Wer
            type: wer
            value: 0.1413499060557884

wav2vec2-large-xls-r-1b-frisian-cv-8

This model is a fine-tuned version of facebook/wav2vec2-xls-r-1b on the common_voice_8_0 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2131
  • Wer: 0.1429

And on the test set:

  • Wer: 0.1413

Model description

This model has been developed for my Master's thesis in "Voice Technology" at Rijksuniversiteit Groningen - Campus Fryslân. It corresponds to experiment 1 where I use the same training set as the XLSR-53 baseline.

Intended uses & limitations

The intended use is for recognizing Frisian speech.

Limitations include no LM rescoring and using version 8.0 of Common Voice instead of 13.0.

Training and evaluation data

The training and evaluation splits used are the ones available in the Common Voice 8.0 Frisian subset.

Training procedure

The script used for training this model can be found in this GitHub repository: link.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 50
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
6.0565 1.72 200 3.1053 1.0
2.7675 3.45 400 1.1551 0.8611
1.3474 5.17 600 0.4770 0.4397
0.9617 6.9 800 0.3218 0.3343
0.9058 8.62 1000 0.2741 0.2768
0.9712 10.34 1200 0.2619 0.2505
0.6908 12.07 1400 0.2288 0.2243
0.745 13.79 1600 0.2288 0.2095
0.7742 15.52 1800 0.2289 0.1979
0.7231 17.24 2000 0.2198 0.1940
0.6475 18.97 2200 0.2180 0.1992
0.6421 20.69 2400 0.2133 0.1741
0.5925 22.41 2600 0.1998 0.1747
0.5608 24.14 2800 0.2212 0.1950
0.5315 25.86 3000 0.2187 0.1624
0.5362 27.59 3200 0.2057 0.1718
0.563 29.31 3400 0.2090 0.1613
0.4218 31.03 3600 0.2126 0.1531
0.3826 32.76 3800 0.2084 0.1538
0.356 34.48 4000 0.2115 0.1612
0.2966 36.21 4200 0.2093 0.1536
0.3377 37.93 4400 0.2061 0.1527
0.321 39.66 4600 0.2121 0.1463
0.2942 41.38 4800 0.2158 0.1441
0.2931 43.1 5000 0.2173 0.1446
0.2346 44.83 5200 0.2152 0.1436
0.2543 46.55 5400 0.2066 0.1445
0.2385 48.28 5600 0.2108 0.1432
0.2726 50.0 5800 0.2131 0.1429

Framework versions

  • Transformers 4.28.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.11.0
  • Tokenizers 0.13.3