# Model Card for WhisperLiveSubs This model is a fine-tuned version of OpenAI's Whisper model on the Common Voice dataset for Urdu speech recognition. It is optimized for transcribing Urdu language audio. ### Model Description This model is a small variant of the Whisper model fine-tuned on the Common Voice dataset for the Urdu language. It is intended for automatic speech recognition (ASR) tasks and performs well in transcribing Urdu speech. - **Developed by:** codewithdark - **Model type:** Whisper-based model for ASR - **Language(s) (NLP):** Urdu (ur) - **License:** Apache 2.0 - **Finetuned from model :** openai/whisper-small ## Uses ### Direct Use This model can be used directly for transcribing Urdu audio into text. It is suitable for applications such as: - Voice-to-text transcription services - Captioning Urdu language videos - Speech analytics in Urdu ### Out-of-Scope Use The model may not perform well for: - Non-Urdu languages - Extremely noisy environments - Very long audio sequences without segmentation ## How to Get Started with the Model Use the code below to get started with the model. ```python from transformers import WhisperProcessor, WhisperForConditionalGeneration processor = WhisperProcessor.from_pretrained("codewithdark/WhisperLiveSubs") model = WhisperForConditionalGeneration.from_pretrained("codewithdark/WhisperLiveSubs") # Your transcription code here ``` ### Training Data The model was fine-tuned on the Mozilla Common Voice dataset, specifically the Urdu subset. The dataset consists of approximately 141 hr of transcribed Urdu speech. #### Preprocessing The audio was resampled to 16kHz, and text was tokenized using the Whisper tokenizer configured for Urdu. #### Training Hyperparameters - **Training regime:** Mixed precision (fp16) - **Batch size:** 8 - **Gradient accumulation steps:** 2 - **Learning rate:** 1e-5 - **Max steps:** 4000 #### Metrics Word Error Rate (WER) was the primary metric used to evaluate the model's performance. ### Results - **Training Loss:** 0.2005 - **Validation Loss:** 0.5342 - **WER:** 51.06 *This is my first time fine-tuning this model. Don't worry about the current performance; improvements can be made to enhance the model's accuracy and reduce the WER.* - **Hardware Type:** P100 GPU - **Hours used:** 10 hr - **Cloud Provider:** Kaggle - **Compute Region:** PK ### Model Architecture and Objective The WhisperLiveSubs model is based on the Whisper architecture, designed for automatic speech recognition. #### Software - **Framework:** PyTorch - **Transformers Version:** #### Summary The model demonstrates acceptable performance for Urdu transcription, but there is room for improvement in terms of WER, especially in noisy conditions or with diverse accents. ## Model Card Contact For inquiries, please contact codewithdark90@gmail.com @Codewithdark. (2024). WhisperLiveSubs: An Urdu Automatic Speech Recognition Model. Retrieved from https://huggingface.co/codewithdark/WhisperLiveSubs