whisper tiny fine-tuned on a very big collection of vietnamese speech datasets
TODO:
- training then publish checkpoint
- evaluate WER on Common Voice & FLEURS & VIVOS
- convert to
openai-whisper
,whisper.cpp
,faster-whisper
- convert to ONNX: to try https://github.com/k2-fsa/sherpa-onnx & https://github.com/zhuzilin/whisper-openvino
- convert to TensorRT: https://github.com/openai/whisper/discussions/169
21k steps, warm-up 5%, batch size 16×2 (kaggle free T4×2)
manually evaluate WER on test set - vietnamese part:
@ float16 |
CommonVoice v16.1 |
FLEURS |
VIVOS |
---|---|---|---|
original whisper-tiny |
>100% | 88.6% | 62.5% |
this model | 26.6% | 37.1% | 18.7% |
all training + evaluation scripts are on my repo: https://github.com/phineas-pta/fine-tune-whisper-vi
usage example:
import torch
from transformers import pipeline
PIPE = pipeline(task="automatic-speech-recognition", model="doof-ferb/whisper-tiny-vi", device="cuda:0", torch_dtype=torch.float16)
PIPE_KWARGS = {"language": "vi", "task": "transcribe"}
PIPE("audio.mp3", generate_kwargs=PIPE_KWARGS)["text"]
- Downloads last month
- 11
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for doof-ferb/whisper-tiny-vi
Base model
openai/whisper-tinyDatasets used to train doof-ferb/whisper-tiny-vi
Evaluation results
- wer on Mozilla CommonVoice (Vietnamese) v16.1test set self-reported26.600
- wer on Google FLEURS (Vietnamese)test set self-reported37.100
- wer on ĐHQG TPHCM VIVOStest set self-reported18.700