r-f's picture
Create new file
20217d5
|
raw
history blame
2.26 kB
metadata
license: apache-2.0
tags:
  - generated_from_trainer
metrics:
  - accuracy
model_index:
  name: wav2vec2-lg-xlsr-en-speech-emotion-recognition

Speech Emotion Recognition By Fine-Tuning Wav2Vec 2.0

The model is a fine-tuned version of jonatasgrosman/wav2vec2-large-xlsr-53-english for a Speech Emotion Recognition (SER) task.]

Several datasets were used the fine-tune the original model:
Surrey Audio-Visual Expressed Emotion (SAVEE) (http://kahlan.eps.surrey.ac.uk/savee/Database.html)

  • 480 audio files from 4 male actors

Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) (https://zenodo.org/record/1188976#.YO6yI-gzaUk)

  • 1440 audio files from 24 professional actors (12 female, 12 male)

Toronto emotional speech set (TESS) (https://tspace.library.utoronto.ca/handle/1807/24487)

  • 2800 audio files from 2 female actors

7 classifcation labels

emotions = ['angry' 'disgust' 'fear' 'happy' 'neutral' 'sad' 'surprise']

It achieves the following results on the evaluation set:

  • Loss: 0.5023
  • Accuracy: 0.8223

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3
  • mixed_precision_training: Native AMP

Training results

Step Training Loss Validation Loss Accuracy 500 1.812400 1.365212 0.486258 1000 0.887200 0.773145 0.797040 1500 0.703500 0.574954 0.852008 2000 0.687900 1.286738 0.775899 2500 0.649800 0.697455 0.832981 3000 0.569600 0.337240 0.892178 3500 0.421800 0.307072 0.911205 4000 0.308800 0.374443 0.930233 4500 0.268800 0.260444 0.936575 5000 0.297300 0.302985 0.923890 5500 0.176500 0.165439 0.961945 6000 0.147500 0.170199 0.961945 6500 0.127400 0.155310 0.966173 7000 0.069900 0.103882 0.976744 7500 0.083000 0.104075 0.974630