whisper-tiny-finetune-hindi-fleurs
This model is a fine-tuned version of openai/whisper-tiny on the google/fleurs dataset. It achieves the following results on the evaluation set:
- Loss: 0.8315
- Wer Ortho: 0.4313
- Wer: 0.4262
A working Hugging Face Space can be found here
Model description
This model is a fine-tuned version of openai/whisper-tiny on the google/fleurs dataset. It improves the WER from 102.3 as stated in the Whisper Paper to 0.42 on the Hindi Subset of google/fleurs
Intended uses & limitations
This model is intended to be used on Edge Low Compute Devices such as the Raspbery Pi Pico/3/3B/4 and offers real time transcription of Hindi audio into the English Lexicon.
Training and evaluation data
The model was trained on google/fleurs
's hi_in
Subset and used WER as the evaluation criteria
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_steps: 50
- training_steps: 500
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer Ortho | Wer |
---|---|---|---|---|---|
1.8112 | 1.39 | 100 | 1.7274 | 0.6323 | 0.6258 |
1.0387 | 2.78 | 200 | 1.1194 | 0.5130 | 0.5072 |
0.7671 | 4.17 | 300 | 0.9671 | 0.4665 | 0.4613 |
0.5283 | 5.56 | 400 | 0.8840 | 0.4494 | 0.4440 |
0.4458 | 6.94 | 500 | 0.8315 | 0.4313 | 0.4262 |
Framework versions
- Transformers 4.35.2
- Pytorch 2.1.0+cu121
- Datasets 2.16.0
- Tokenizers 0.15.0
Citations
@inproceedings{Bhat:2014:ISS:2824864.2824872,
author = {Bhat, Irshad Ahmad and Mujadia, Vandan and Tammewar, Aniruddha and Bhat, Riyaz Ahmad and Shrivastava, Manish},
title = {IIIT-H System Submission for FIRE2014 Shared Task on Transliterated Search},
booktitle = {Proceedings of the Forum for Information Retrieval Evaluation},
series = {FIRE '14},
year = {2015},
isbn = {978-1-4503-3755-7},
location = {Bangalore, India},
pages = {48--53},
numpages = {6},
url = {http://doi.acm.org/10.1145/2824864.2824872},
doi = {10.1145/2824864.2824872},
acmid = {2824872},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {Information Retrieval, Language Identification, Language Modeling, Perplexity, Transliteration},
}
@misc{radford2022whisper,
doi = {10.48550/ARXIV.2212.04356},
url = {https://arxiv.org/abs/2212.04356},
author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
title = {Robust Speech Recognition via Large-Scale Weak Supervision},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}
- Downloads last month
- 21
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for Aryan-401/whisper-tiny-finetune-hindi-fleurs
Base model
openai/whisper-tiny