WhisperLiveSubs / README.md
codewithdark's picture
Update README.md
046c1ca verified
|
raw
history blame
3 kB

Model Card for WhisperLiveSubs

This model is a fine-tuned version of OpenAI's Whisper model on the Common Voice dataset for Urdu speech recognition. It is optimized for transcribing Urdu language audio.

Model Description

This model is a small variant of the Whisper model fine-tuned on the Common Voice dataset for the Urdu language. It is intended for automatic speech recognition (ASR) tasks and performs well in transcribing Urdu speech.

  • Developed by: codewithdark
  • Model type: Whisper-based model for ASR
  • Language(s) (NLP): Urdu (ur)
  • License: Apache 2.0
  • Finetuned from model : openai/whisper-small

Uses

Direct Use

This model can be used directly for transcribing Urdu audio into text. It is suitable for applications such as:

  • Voice-to-text transcription services
  • Captioning Urdu language videos
  • Speech analytics in Urdu

Out-of-Scope Use

The model may not perform well for:

  • Non-Urdu languages
  • Extremely noisy environments
  • Very long audio sequences without segmentation

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import WhisperProcessor, WhisperForConditionalGeneration

processor = WhisperProcessor.from_pretrained("codewithdark/WhisperLiveSubs")
model = WhisperForConditionalGeneration.from_pretrained("codewithdark/WhisperLiveSubs")

# Your transcription code here

Training Data

The model was fine-tuned on the Mozilla Common Voice dataset, specifically the Urdu subset. The dataset consists of approximately 141 hr of transcribed Urdu speech.

Preprocessing

The audio was resampled to 16kHz, and text was tokenized using the Whisper tokenizer configured for Urdu.

Training Hyperparameters

  • Training regime: Mixed precision (fp16)
  • Batch size: 8
  • Gradient accumulation steps: 2
  • Learning rate: 1e-5
  • Max steps: 4000

Metrics

Word Error Rate (WER) was the primary metric used to evaluate the model's performance.

Results

  • Training Loss: 0.2005
  • Validation Loss: 0.5342
  • WER: 51.06

This is my first time fine-tuning this model. Don't worry about the current performance; improvements can be made to enhance the model's accuracy and reduce the WER.

  • Hardware Type: P100 GPU
  • Hours used: 10 hr
  • Cloud Provider: Kaggle
  • Compute Region: PK

Model Architecture and Objective

The WhisperLiveSubs model is based on the Whisper architecture, designed for automatic speech recognition.

Software

  • Framework: PyTorch
  • Transformers Version:

Summary

The model demonstrates acceptable performance for Urdu transcription, but there is room for improvement in terms of WER, especially in noisy conditions or with diverse accents.

Model Card Contact

For inquiries, please contact codewithdark90@gmail.com

@Codewithdark. (2024). WhisperLiveSubs: An Urdu Automatic Speech Recognition Model. Retrieved from https://huggingface.co/codewithdark/WhisperLiveSubs