metadata

language:
  - en
license: apache-2.0
base_model: openai/whisper-tiny
tags:
  - generated_from_trainer
metrics:
  - wer
model-index:
  - name: whispertiny-shreyas
    results: []

whispertiny-shreyas

This model is a fine-tuned version of Whisper Tiny on the AI4Bharat-svarah dataset. It achieves the following results on the evaluation set:

Loss: 0.5414
Wer: 22.8322

Model description

Whisper is a Transformer based encoder-decoder model, also referred to as a sequence-to-sequence model. It was trained on 680k hours of labelled speech data annotated using large-scale weak supervision.

The models were trained on either English-only data or multilingual data. The English-only models were trained on the task of speech recognition. The multilingual models were trained on both speech recognition and speech translation. For speech recognition, the model predicts transcriptions in the same language as the audio. For speech translation, the model predicts transcriptions to a different language to the audio.

Whisper checkpoints come in five configurations of varying model sizes. The smallest four are trained on either English-only or multilingual data. The largest checkpoints are multilingual only. All ten of the pre-trained checkpoints are available on the Hugging Face Hub. The checkpoints are summarised in the following table with links to the models on the Hub:

Training procedure

Refer to Sanchit's blog and make changes according to the dependencies' version you have.

Demo

I have hosted a demo of this on HF Spaces (16GB CPU Inference). Here is the link to the demo

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 2000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.2412	2.6702	1000	0.5319	22.8914
0.1071	5.3405	2000	0.5414	22.8322

Framework versions

Transformers 4.43.3
Pytorch 2.4.1
Datasets 2.14.7
Tokenizers 0.19.1