M9and2M
/

whisper_small_wolof_mix_hum_mach_data

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

whisper_small_wolof_mix_hum_mach_data / README.md

M9and2M's picture

Create README.md

3192bb0 verified 4 months ago

|

history blame contribute delete

2.68 kB

	---
	license: mit
	datasets:
	- M9and2M/Wolof_ASR_dataset
	language:
	- wo
	metrics:
	- wer
	pipeline_tag: automatic-speech-recognition
	tags:
	- Wolof
	- ASR
	---


	# Wolof ASR Model (Based on Whisper-Small) trained with mixed human and machine generated dataset

	## Model Overview

	This repository hosts an Automatic Speech Recognition (ASR) model for the Wolof language, fine-tuned from OpenAI's Whisper-small model. This model aims to provide accurate transcription of Wolof audio data.

	## Model Details

	- Model Base: Whisper-small
	- Loss: 0.123
	- WER: 0.16

	## Dataset

	The dataset used for training and evaluating this model is a collection from various sources, ensuring a rich and diverse set of Wolof audio samples. The collection is available in my Hugging Face account is used by keeping only the audios with duration shorter than 6 second. In addition of this dataset, audios from YouTub videos are used to synthetize labeled data. This machine generated dataset is mixed with the training dataset and represents 19 % of the dataset used during the training.

	- Training Dataset: 57 hours and 13 hours audio with machine generated transcripts
	- Test Dataset: 10 hours

	For detailed information about the dataset, please refer to the [M9and2M/Wolof_ASR_dataset](https://huggingface.co/datasets/M9and2M/Wolof_ASR_dataset).

	## Training

	The training process was adapted from the code in the [Finetune Wa2vec 2.0 For Speech Recognition](https://github.com/khanld/ASR-Wa2vec-Finetune) written to fine-tune Wav2Vec2.0 for speech recognition. Special thanks to the author, Duy Khanh, Le for providing a robust and flexible training framework.

	The model was trained with the following configuration:

	- Seed: 19
	- Training Batch Size: 1
	- Gradient Accumulation Steps: 8
	- Number of GPUs: 2

	### Optimizer : AdamW

	- Learning Rate: 1e-7

	### Scheduler: OneCycleLR

	- Max Learning Rate: 5e-5

	## Acknowledgements
	This model was built using OpenAI's [Whisper-small](https://huggingface.co/openai/whisper-small) architecture and fine-tuned with a dataset collected from various sources. Special thanks to the creators and contributors of the dataset.


	<!-- ## Citation [optional]

	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

	<!-- BibTeX: -->

	<!-- [More Information Needed] -->

	<!-- APA: -->



	## More Information

	This model has been developed in the context of my Master Thesis at ETSIT-UPM, Madrid under the supervision of Prof. Luis A. Hernández Gómez.


	## Contact

	For any inquiries or questions, please contact mamadou.marone@ensea.fr