Whisper Large V3 (Thai): Combined V1
This model is a fine-tuned version of openai/whisper-medium on augmented versions of the mozilla-foundation/common_voice_13_0 th, google/fleurs, and curated datasets. It achieves the following results on the common-voice-13 test set:
- WER: 6.59 (with Deepcut Tokenizer)
Model description
Use the model with huggingface's transformers
as follows:
from transformers import pipeline
MODEL_NAME = "biodatlab/whisper-th-large-v3-combined" # specify the model name
lang = "th" # change to Thai langauge
device = 0 if torch.cuda.is_available() else "cpu"
pipe = pipeline(
task="automatic-speech-recognition",
model=MODEL_NAME,
chunk_length_s=30,
device=device,
)
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(
language=lang,
task="transcribe"
)
text = pipe("audio.mp3")["text"] # give audio mp3 and transcribe text
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 10000
- mixed_precision_training: Native AMP
Framework versions
- Transformers 4.37.2
- Pytorch 2.1.0
- Datasets 2.16.1
- Tokenizers 0.15.1
Citation
Cite using Bibtex:
@misc {thonburian_whisper_med,
author = { Atirut Boribalburephan, Zaw Htet Aung, Knot Pipatsrisawat, Titipat Achakulvisut },
title = { Thonburian Whisper: A fine-tuned Whisper model for Thai automatic speech recognition },
year = 2022,
url = { https://huggingface.co/biodatlab/whisper-th-medium-combined },
doi = { 10.57967/hf/0226 },
publisher = { Hugging Face }
}
- Downloads last month
- 2,059
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for biodatlab/whisper-th-large-v3-combined
Base model
openai/whisper-large-v3Datasets used to train biodatlab/whisper-th-large-v3-combined
Evaluation results
- Wer on mozilla-foundation/common_voice_13_0 thtest set self-reported6.590