metadata

license: apache-2.0
base_model: facebook/wav2vec2-xls-r-300m
tags:
  - generated_from_trainer
datasets:
  - common_voice_17_0
metrics:
  - wer
model-index:
  - name: xlsr-128upper-sorbian
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: common_voice_17_0
          type: common_voice_17_0
          config: hsb
          split: validation
          args: hsb
        metrics:
          - name: Wer
            type: wer
            value: 0.549367088607595

xlsr-128upper-sorbian

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the common_voice_17_0 dataset. It achieves the following results on the evaluation set:

Loss: 0.7110
Wer: 0.5494
Cer: 0.1188

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 50
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
3.8492	3.9216	100	3.9919	1.0	1.0
3.1983	7.8431	200	3.2332	1.0	1.0
2.9601	11.7647	300	3.0166	0.9873	0.9798
0.4618	15.6863	400	0.7749	0.7557	0.1917
0.2411	19.6078	500	0.7812	0.7013	0.1702
0.1112	23.5294	600	0.7275	0.6405	0.1508
0.1108	27.4510	700	0.7995	0.6247	0.1440
0.0432	31.3725	800	0.7902	0.6139	0.1432
0.0431	35.2941	900	0.7615	0.5797	0.1372
0.0515	39.2157	1000	0.7029	0.5456	0.1234
0.0241	43.1373	1100	0.7296	0.5285	0.1188
0.0342	47.0588	1200	0.7110	0.5494	0.1188

Framework versions

Transformers 4.42.0.dev0
Pytorch 2.3.1+cu121
Datasets 2.19.2
Tokenizers 0.19.1