File size: 2,679 Bytes
a6e8d57 90d66ba a6e8d57 90d66ba a5d9ae7 90d66ba 3e7f332 a6e8d57 08fd86d a6e8d57 492c67f 3e7f332 492c67f 3e7f332 a6e8d57 f405841 a6e8d57 b0b1764 a6e8d57 90d66ba 08fd86d 90d66ba a6e8d57 f405841 a6e8d57 f405841 a6e8d57 f405841 a6e8d57 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
---
license: apache-2.0
language: as
tags:
- audio
- automatic-speech-recognition
- speech
- xlsr-fine-tuning
- as
- robust-speech-event
- hf-asr-leaderboard
datasets:
- mozilla-foundation/common_voice_7_0
model-index:
- name: XLS-R-300M - Assamese
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Common Voice 7
type: mozilla-foundation/common_voice_7_0
args: as
metrics:
- name: Test WER
type: wer
value: 72.64
- name: Test CER
type: cer
value: 27.35
---
# wav2vec2-large-xls-r-300m-assamese
This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the common_voice_7_0 dataset.
It achieves the following results on the evaluation set:
- WER: 0.7954545454545454
- CER: 0.32341269841269843
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
To compute the evaluation parameters
```bash
cd wav2vec2-large-xls-r-300m-assamese; python eval.py --model_id ./ --dataset mozilla-foundation/common_voice_7_0 --config as --split test --log_outputs
```
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-4
- train_batch_size: 16
- eval_batch_size: 8
- seed: not given
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 400
- mixed_precision_training: Native AMP
### Training results
| Training Loss | Epoch | Step | Validation Loss | Wer |
|:-------------:|:------:|:----:|:---------------:|:------: |
| 1.584065 | NA | 400 | 1.584065 | 0.915512 |
| 1.658865 | Na | 800 | 1.658865 | 0.805096 |
| 1.882352 | NA | 1200 | 1.882352 | 0.820742 |
| 1.881240 | NA | 1600 | 1.881240 | 0.810907 |
| 2.159748 | NA | 2000 | 2.159748 | 0.804202 |
| 1.992871 | NA | 2400 | 1.992871 | 0.803308 |
| 2.201436 | NA | 2800 | 2.201436 | 0.802861 |
| 2.165218 | NA | 3200 | 2.165218 | 0.793920 |
| 2.253643 | NA | 3600 | 2.253643 | 0.796603 |
| 2.265880 | NA | 4000 | 2.265880 | 0.790344 |
| 2.293935 | NA | 4400 | 2.293935 | 0.797050 |
| 2.288851 | NA | 4800 | 2.288851 | 0.784086 |
### Framework versions
- Transformers 4.11.3
- Pytorch 1.10.0+cu113
- Datasets 1.13.3
- Tokenizers 0.10.3
|