metadata

license: apache-2.0
tags:
  - generated_from_trainer
datasets:
  - common_voice_8_0
metrics:
  - wer
model-index:
  - name: wav2vec2-large-xls-r-1b-frisian-cv-8
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: common_voice_8_0
          type: common_voice_8_0
          config: fy-NL
          split: validation
          args: fy-NL
        metrics:
          - name: Wer
            type: wer
            value: 0.14290815597771747
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: common_voice_8_0
          type: common_voice_8_0
          config: fy-NL
          split: test
          args: fy-NL
        metrics:
          - name: Wer
            type: wer
            value: 0.1413499060557884

wav2vec2-large-xls-r-1b-frisian-cv-8

This model is a fine-tuned version of facebook/wav2vec2-xls-r-1b on the common_voice_8_0 dataset. It achieves the following results on the evaluation set:

Loss: 0.2131
Wer: 0.1429

And on the test set:

Wer: 0.1413

Model description

This model has been developed for my Master's thesis in "Voice Technology" at Rijksuniversiteit Groningen - Campus Fryslân. It corresponds to experiment 1 where I use the same training set as the XLSR-53 baseline.

Intended uses & limitations

The intended use is for recognizing Frisian speech.

Limitations include no LM rescoring and using version 8.0 of Common Voice instead of 13.0.

Training and evaluation data

The training and evaluation splits used are the ones available in the Common Voice 8.0 Frisian subset.

Training procedure

The script used for training this model can be found in this GitHub repository: link.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 32
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 50
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
6.0565	1.72	200	3.1053	1.0
2.7675	3.45	400	1.1551	0.8611
1.3474	5.17	600	0.4770	0.4397
0.9617	6.9	800	0.3218	0.3343
0.9058	8.62	1000	0.2741	0.2768
0.9712	10.34	1200	0.2619	0.2505
0.6908	12.07	1400	0.2288	0.2243
0.745	13.79	1600	0.2288	0.2095
0.7742	15.52	1800	0.2289	0.1979
0.7231	17.24	2000	0.2198	0.1940
0.6475	18.97	2200	0.2180	0.1992
0.6421	20.69	2400	0.2133	0.1741
0.5925	22.41	2600	0.1998	0.1747
0.5608	24.14	2800	0.2212	0.1950
0.5315	25.86	3000	0.2187	0.1624
0.5362	27.59	3200	0.2057	0.1718
0.563	29.31	3400	0.2090	0.1613
0.4218	31.03	3600	0.2126	0.1531
0.3826	32.76	3800	0.2084	0.1538
0.356	34.48	4000	0.2115	0.1612
0.2966	36.21	4200	0.2093	0.1536
0.3377	37.93	4400	0.2061	0.1527
0.321	39.66	4600	0.2121	0.1463
0.2942	41.38	4800	0.2158	0.1441
0.2931	43.1	5000	0.2173	0.1446
0.2346	44.83	5200	0.2152	0.1436
0.2543	46.55	5400	0.2066	0.1445
0.2385	48.28	5600	0.2108	0.1432
0.2726	50.0	5800	0.2131	0.1429

Framework versions

Transformers 4.28.1
Pytorch 2.0.0+cu117
Datasets 2.11.0
Tokenizers 0.13.3