wav2vec 2.0 XLSR-53 Model
This is the wav2vec 2.0 XLSR-53 model fine-tuned on the Common Voice 8.0 datasets for Bahasa Indonesia using the train
, validation
, and other
splits (~32.000 sound samples). This model was used for research purposes to complete my Undergraduate Thesis.
Preprocessing
- Removal of symbols from transcript
- Convert numbers (0, 1, ..., 9) to word forms (satu, dua, ..., sembilan)
- Convert all characters to lowercase
- Resample the audio data to 16kHz.
- Uses data collator from this example
Hyperparameters used
- Learning rate = 1e-4
- Maximum Epochs = 30
- Batch size = 4 (limitations of compute resource)
- Early stopping = Stop when WER doesn't improve for 2 validations
- Other parameters use the defaults from this config
Results
The results are an average of 5 runs using the test
split from the Common Voice datasets for Bahasa Indonesia.
Test Result: 15,6% WER
References
- Downloads last month
- 20
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.