Model Card for whisper-small-es-cl

Finetuned Whisper Model for Automatic Speech Recognition in Spanish from Chile

Model Details

Whisper is a model from OpenAI, based on the GPT architecture, designed to generate transcription text from audio sequences. This model can be used with the WhisperX Pipeline developed by @m-bain, which integrates transcription text with the pyannotate library to perform diarization. Diarization involves separating and classifying speakers in the audio, resulting in a transcription segmented by speaker. This process returns a transcription object with identified segments associated with specific speakers.

Model Description

Developed by: Open AI
Model type: Automatic Speech Recognition
Language(s) (NLP): Spanish from Chile
License: MIT
Finetuned from model Whisper-Small: openai/whisper-small

Model Details

Name: whisper-small-es-cl
Model Type: Sequence-to-sequence automatic speech recognition
Parameters: 39M
Neurons Width: 768
Attention Heads: 12
Layers: 12
Input Activation Function: GELU
Output Activation Function: Softmax

Model Sources

Repository: openai/whisper-small
Paper: Robust Speech Recognition via Large-Scale Weak Supervision https://arxiv.org/abs/2212.04356

Uses

Designed for automatic speech recognition, it can generate the transcription of audio conversations in any spanish language

Direct Use

Returns the transcription text of a voice audio optimized for the spanish from Chile

Out-of-Scope Use

Research only purpose, non commercial use

Bias, Risks, and Limitations

This model is just an experimental example, there are plenty of specific words ie. names or brands, that are not recognized by the model. The efectiveness of the phoneme-based log-mel spectrogram is influenced by the amount and variety of data on which the model has been trained, in this particular case, more data will be needed but with the same dataset structure. The access to use this model is restricted and the specific details about the dataset are private data.

Recommendations

Use with a diarization pipeline as WhisperX to generate the diarized transcription, for this particular case, it will be needed the trasformation to the whisper faster format.

How to Get Started with the Model

Use the code below to get started with the model, download, transform and test. [https://github.com/al-mldev/whisperX-RAG-Analytics]

Training Details

Training Data

The model was fine-tuned with approximately 1.800 audio samples, including segmented tagged audios and specific tagged words.

Training Procedure

This model was finetuned with a dataset created with segmented tagged audio and specific words tagged audios, the segmentation was 90/10 where 90% of the data is for training, 5% for testing and 5% for validation.

Training Hyperparameters

Loss Function: Cross Entropy
Learning Rate: 2.25x10^-5
Optimization Function: AdamW
Regularization: Dropout (0,1-0,3)
Epochs: 22
Batch Size: 16
Grad Acumulated Steps: 1
Warmup Steps: 25
Max Steps: 200
Eval Step: 50
Save Step: 50
Eval Batch Size: 8

Testing Data, Factors & Summary

Testing Data

Factors

Audio Quality: The model was tested with various audio qualities, including noisy and clean audio.

Summary

WER: 25.49%
Training Loss: 2 × 10^-3
Validation Loss: 5.85 × 10^-1

Environmental Impact

Hardware Type: T4
Hours used: 0.5
Cloud Provider: Google Cloud Platform
Compute Region: southamerica-east1
Carbon Emitted: 0.01 kg of CO2

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

al-ml-research
/

whisper-small-es-cl

You need to agree to share your contact information to access this model

Model Card for whisper-small-es-cl

Model Details

Model Description

Model Details

Model Sources

Uses

Direct Use

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Training Hyperparameters

Testing Data, Factors & Summary

Testing Data

Factors

Summary

Environmental Impact