--- license: mit datasets: - M9and2M/Wolof_ASR_dataset language: - wo metrics: - wer pipeline_tag: automatic-speech-recognition tags: - Wolof - ASR --- # Wolof ASR Model (Based on Whisper-Small) trained with mixed human and machine generated dataset ## Model Overview This repository hosts an Automatic Speech Recognition (ASR) model for the Wolof language, fine-tuned from OpenAI's Whisper-small model. This model aims to provide accurate transcription of Wolof audio data. ## Model Details - **Model Base**: Whisper-small - **Loss**: 0.123 - **WER**: 0.16 ## Dataset The dataset used for training and evaluating this model is a collection from various sources, ensuring a rich and diverse set of Wolof audio samples. The collection is available in my Hugging Face account is used by keeping only the audios with duration shorter than 6 second. In addition of this dataset, audios from YouTub videos are used to synthetize labeled data. This machine generated dataset is mixed with the training dataset and represents 19 % of the dataset used during the training. - **Training Dataset**: 57 hours and 13 hours audio with machine generated transcripts - **Test Dataset**: 10 hours For detailed information about the dataset, please refer to the [M9and2M/Wolof_ASR_dataset](https://huggingface.co/datasets/M9and2M/Wolof_ASR_dataset). ## Training The training process was adapted from the code in the [Finetune Wa2vec 2.0 For Speech Recognition](https://github.com/khanld/ASR-Wa2vec-Finetune) written to fine-tune Wav2Vec2.0 for speech recognition. Special thanks to the author, Duy Khanh, Le for providing a robust and flexible training framework. The model was trained with the following configuration: - **Seed**: 19 - **Training Batch Size**: 1 - **Gradient Accumulation Steps**: 8 - **Number of GPUs**: 2 ### Optimizer : AdamW - **Learning Rate**: 1e-7 ### Scheduler: OneCycleLR - **Max Learning Rate**: 5e-5 ## Acknowledgements This model was built using OpenAI's [Whisper-small](https://huggingface.co/openai/whisper-small) architecture and fine-tuned with a dataset collected from various sources. Special thanks to the creators and contributors of the dataset. ## More Information This model has been developed in the context of my Master Thesis at ETSIT-UPM, Madrid under the supervision of Prof. Luis A. Hernández Gómez. ## Contact For any inquiries or questions, please contact mamadou.marone@ensea.fr