File size: 3,215 Bytes
887b244 eccc472 63b779a 887b244 862f642 eccc472 887b244 862f642 a432bd1 887b244 862f642 eccc472 862f642 eccc472 a432bd1 eccc472 862f642 eccc472 63b779a eccc472 63b779a 887b244 e2bbfae 887b244 e2bbfae 887b244 e2bbfae 887b244 e2bbfae 887b244 e2bbfae 887b244 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
---
language:
- sv
license: apache-2.0
tags:
- automatic-speech-recognition
- robust-speech-event
- hf-asr-leaderboard
datasets:
- mozilla-foundation/common_voice_8_0
metrics:
- wer
- cer
base_model: facebook/wav2vec2-xls-r-300m
model-index:
- name: wav2vec2-xls-r-300m-swedish
results:
- task:
type: automatic-speech-recognition
name: Speech Recognition
dataset:
name: Common Voice sv-SE
type: mozilla-foundation/common_voice_8_0
args: sv-SE
metrics:
- type: wer
value: 24.73
name: Test WER
args:
learning_rate: 7.5e-05
train_batch_size: 64
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 50
mixed_precision_training: Native AMP
- type: cer
value: 7.58
name: Test CER
args:
learning_rate: 7.5e-05
train_batch_size: 64
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 50
mixed_precision_training: Native AMP
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# wav2vec2-large-xls-r-300m-Swedish
This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the common_voice dataset.
It achieves the following results on the evaluation set:
- Loss: 0.3641
- Wer: 0.2473
- Cer: 0.0758
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 7.5e-05
- train_batch_size: 64
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1000
- num_epochs: 50
- mixed_precision_training: Native AMP
### Training results
| Training Loss | Epoch | Step | Validation Loss | Wer | Cer |
|:-------------:|:-----:|:----:|:---------------:|:------:|:------:|
| 6.1097 | 5.49 | 500 | 3.1422 | 1.0 | 1.0 |
| 2.985 | 10.98 | 1000 | 1.7357 | 0.9876 | 0.4125 |
| 1.0363 | 16.48 | 1500 | 0.4773 | 0.3510 | 0.1047 |
| 0.6111 | 21.97 | 2000 | 0.3937 | 0.2998 | 0.0910 |
| 0.4942 | 27.47 | 2500 | 0.3779 | 0.2776 | 0.0844 |
| 0.4421 | 32.96 | 3000 | 0.3745 | 0.2630 | 0.0807 |
| 0.4018 | 38.46 | 3500 | 0.3685 | 0.2553 | 0.0781 |
| 0.3759 | 43.95 | 4000 | 0.3618 | 0.2488 | 0.0761 |
| 0.3646 | 49.45 | 4500 | 0.3641 | 0.2473 | 0.0758 |
### Framework versions
- Transformers 4.17.0.dev0
- Pytorch 1.10.2+cu102
- Datasets 1.18.2.dev0
- Tokenizers 0.11.0
|