Model Card for emlinking/wav2vec2-large-xls-r-300m-tsm-asr-v6
An automatic speech recognition model for Taiwanese Southern Min which generates transcriptions in the T芒i-l么 orthography.
Model Details
Model Description
An automatic speech recognition model for Taiwanese Southern Min which generates transcriptions in the T芒i-l么 orthography.
- Developed by: Eleanor Lin
- Language(s) (NLP): Taiwanese
- Finetuned from model: facebook/wav2vec2-xls-r-300m
Model Sources
- Paper: Babu, A., Wang, C., Tjandra, A., Lakhotia, K., Xu, Q., Goyal, N., ... & Auli, M. (2021). XLS-R: Self-supervised cross-lingual speech representation learning at scale. arXiv preprint arXiv:2111.09296.
Uses
This model can be used to transcribe Taiwanese speech in the T芒i-l么 orthography, e.g. to automatically generate transcripts of videos or podcasts.
Training Details
Training Data
This model is fine-tuned on 9.57 hours of Taiwanese speech (10,949 spoken utterances) from the following sources:
- https://huggingface.co/datasets/mozilla-foundation/common_voice_16_1
- https://sites.ualberta.ca/~johnnewm/TSM/Taiwanese_Southern_Min/TSM.html
- https://sites.google.com/nycu.edu.tw/fsw/home/tat-tts-corpus (samples only)
- https://sites.google.com/nycu.edu.tw/fsw/home/tat-phase-i (samples only)
- https://suisiann-dataset.ithuan.tw/
Training Procedure
Preprocessing
All punctuation except for hyphens ("-") are removed from the transcriptions and audio is resampled to 16kHz.
Training Hyperparameters
- Training regime: per-device training batch size=8, gradient accumulation steps=2, fp16 16-bit (mixed) precision training, group_by_length=True, learning rate=3e-4, warmup steps=500, epochs=30
Testing Data, Factors & Metrics
Testing Data
TAT Speech-to-Speech Translation Benchmark validation set
Metrics
Word error rate
Results
Validation set WER = 0.666
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: Tesla T4 GPU
- Hours used: 10.4
Software
This model was fine-tuned using free Google Colab GPU time.
Citation
Eleanor Lin. Developing Performant Models for Translating Spoken Taiwanese Into Spoken English Using Free and Publicly Available Resources. Columbia University Program of Linguistics, April 2024. Undergraduate thesis. Thesis
BibTeX:
Forthcoming
APA:
Forthcoming
Model Card Authors [optional]
Eleanor Lin
Model Card Contact
- Downloads last month
- 12