You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model Card for whisper-small-es-cl

Finetuned Whisper Model for Automatic Speech Recognition in Spanish from Chile

Model Details

Whisper is a model from OpenAI, based on the GPT architecture, designed to generate transcription text from audio sequences. This model can be used with the WhisperX Pipeline developed by @m-bain, which integrates transcription text with the pyannotate library to perform diarization. Diarization involves separating and classifying speakers in the audio, resulting in a transcription segmented by speaker. This process returns a transcription object with identified segments associated with specific speakers.

Model Description

  • Developed by: Open AI
  • Model type: Automatic Speech Recognition
  • Language(s) (NLP): Spanish from Chile
  • License: MIT
  • Finetuned from model Whisper-Small: openai/whisper-small

Model Details

  • Name: whisper-small-es-cl
  • Model Type: Sequence-to-sequence automatic speech recognition
  • Parameters: 39M
  • Neurons Width: 768
  • Attention Heads: 12
  • Layers: 12
  • Input Activation Function: GELU
  • Output Activation Function: Softmax

Model Sources

Uses

Designed for automatic speech recognition, it can generate the transcription of audio conversations in any spanish language

Direct Use

Returns the transcription text of a voice audio optimized for the spanish from Chile

Out-of-Scope Use

Research only purpose, non commercial use

Bias, Risks, and Limitations

This model is just an experimental example, there are plenty of specific words ie. names or brands, that are not recognized by the model. The efectiveness of the phoneme-based log-mel spectrogram is influenced by the amount and variety of data on which the model has been trained, in this particular case, more data will be needed but with the same dataset structure. The access to use this model is restricted and the specific details about the dataset are private data.

Recommendations

Use with a diarization pipeline as WhisperX to generate the diarized transcription, for this particular case, it will be needed the trasformation to the whisper faster format.

How to Get Started with the Model

Use the code below to get started with the model, download, transform and test. [https://github.com/al-mldev/whisperX-RAG-Analytics]

Training Details

Training Data

The model was fine-tuned with approximately 1.800 audio samples, including segmented tagged audios and specific tagged words.

Dataset structure

Training Procedure

This model was finetuned with a dataset created with segmented tagged audio and specific words tagged audios, the segmentation was 90/10 where 90% of the data is for training, 5% for testing and 5% for validation.

Training Hyperparameters
  • Loss Function: Cross Entropy
  • Learning Rate: 2.25x10^-5
  • Optimization Function: AdamW
  • Regularization: Dropout (0,1-0,3)
  • Epochs: 22
  • Batch Size: 16
  • Grad Acumulated Steps: 1
  • Warmup Steps: 25
  • Max Steps: 200
  • Eval Step: 50
  • Save Step: 50
  • Eval Batch Size: 8

Testing Data, Factors & Summary

Testing Data

Factors

  • Audio Quality: The model was tested with various audio qualities, including noisy and clean audio.

Summary

  • WER: 25.49%

  • Training Loss: 2 × 10^-3

  • Validation Loss: 5.85 × 10^-1

Environmental Impact

  • Hardware Type: T4
  • Hours used: 0.5
  • Cloud Provider: Google Cloud Platform
  • Compute Region: southamerica-east1
  • Carbon Emitted: 0.01 kg of CO2

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Downloads last month
0
Safetensors
Model size
242M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.