r-f
/

wav2vec-english-speech-emotion-recognition

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

r-f commited on Sep 24, 2022

Commit

129d357

•

1 Parent(s): 574f7d9

Update README.md

Files changed (1) hide show

README.md +8 -8

README.md CHANGED Viewed

@@ -5,7 +5,7 @@ tags:
 metrics:
 - accuracy
 model_index:
-  name: wav2vec2-lg-xlsr-en-speech-emotion-recognition
 ---
 # Speech Emotion Recognition By Fine-Tuning Wav2Vec 2.0
 The model is a fine-tuned version of [jonatasgrosman/wav2vec2-large-xlsr-53-english](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-english) for a Speech Emotion Recognition (SER) task.]
@@ -20,13 +20,13 @@ Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) (https://ze
 Toronto emotional speech set (TESS) (https://tspace.library.utoronto.ca/handle/1807/24487)
 - 2800 audio files from 2 female actors
-7 classifcation labels
 ```python
 emotions = ['angry' 'disgust' 'fear' 'happy' 'neutral' 'sad' 'surprise']
 ```
 It achieves the following results on the evaluation set:
-- Loss: 0.5023
-- Accuracy: 0.8223
 ## Model description
 More information needed
 ## Intended uses & limitations
@@ -39,13 +39,13 @@ The following hyperparameters were used during training:
 - learning_rate: 0.0001
 - train_batch_size: 4
 - eval_batch_size: 4
 - seed: 42
 - gradient_accumulation_steps: 2
-- total_train_batch_size: 8
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
-- num_epochs: 3
-- mixed_precision_training: Native AMP
 ### Training results
 | Step | Training Loss | Validation Loss | Accuracy |

 metrics:
 - accuracy
 model_index:
+  name: wav2vec-english-speech-emotion-recognition
 ---
 # Speech Emotion Recognition By Fine-Tuning Wav2Vec 2.0
 The model is a fine-tuned version of [jonatasgrosman/wav2vec2-large-xlsr-53-english](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-english) for a Speech Emotion Recognition (SER) task.]
 Toronto emotional speech set (TESS) (https://tspace.library.utoronto.ca/handle/1807/24487)
 - 2800 audio files from 2 female actors
+7 labels/emotions were used as classification labels
 ```python
 emotions = ['angry' 'disgust' 'fear' 'happy' 'neutral' 'sad' 'surprise']
 ```
 It achieves the following results on the evaluation set:
+- Loss: 0.104075
+- Accuracy: 0.97463
 ## Model description
 More information needed
 ## Intended uses & limitations
 - learning_rate: 0.0001
 - train_batch_size: 4
 - eval_batch_size: 4
+- eval_steps: 500
 - seed: 42
 - gradient_accumulation_steps: 2
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- num_epochs: 4
+- max_steps=7500
+- save_steps: 1500
 ### Training results
 | Step | Training Loss | Validation Loss | Accuracy |