Edit model card

Wav2vec2-xlsr-Shemo

This model is a fine-tuned version of ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition on the minoosh/shEMO dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9168
  • Accuracy: 0.7267

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.003
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Accuracy
1.1825 1.0 150 1.1383 0.6267
1.3392 2.0 300 1.4398 0.5533
1.2058 3.0 450 1.1194 0.6300
1.0984 4.0 600 1.2049 0.6200
1.0033 5.0 750 1.0080 0.6500
0.9694 6.0 900 0.9878 0.6367
0.8506 7.0 1050 0.8965 0.7033
0.8068 8.0 1200 0.9359 0.6833
0.7674 9.0 1350 1.1235 0.6333
0.7817 10.0 1500 0.8682 0.6900
0.7172 11.0 1650 0.8289 0.7067
0.6989 12.0 1800 0.9318 0.7000
0.6127 13.0 1950 0.8712 0.6967
0.6311 14.0 2100 0.8965 0.7133
0.5901 15.0 2250 0.9008 0.7267
0.5667 16.0 2400 1.0093 0.7200
0.5652 17.0 2550 0.9032 0.7300
0.565 18.0 2700 0.9317 0.7267
0.5705 19.0 2850 1.0134 0.7133
0.4984 20.0 3000 0.9432 0.7367
0.5207 21.0 3150 0.9368 0.6933
0.5005 22.0 3300 0.9746 0.7033
0.5055 23.0 3450 1.0437 0.7133
0.4867 24.0 3600 1.0052 0.7067
0.5315 25.0 3750 0.9689 0.7200
0.4755 26.0 3900 0.8962 0.7367
0.5083 27.0 4050 0.9319 0.7300
0.4661 28.0 4200 0.9301 0.7233
0.4536 29.0 4350 0.9370 0.7267
0.4693 30.0 4500 0.9168 0.7267

Framework versions

  • Transformers 4.29.2
  • Pytorch 2.0.1+cu117
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
8
Inference API
Unable to determine this model’s pipeline type. Check the docs .