ehcalabres
/

wav2vec2-lg-xlsr-en-speech-emotion-recognition

Audio Classification

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

ehcalabres commited on Jul 14, 2021

Commit

874fe61

•

1 Parent(s): ebe7999

Update README.md

Files changed (1) hide show

README.md +10 -2

README.md CHANGED Viewed

@@ -11,9 +11,13 @@ model_index:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# wav2vec2-lg-xlsr-en-speech-emotion-recognition
-This model is a fine-tuned version of [jonatasgrosman/wav2vec2-large-xlsr-53-english](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-english) for a Speech Emotion Recognition (SER) task.
 The dataset used to fine-tune the original pre-trained model is the [RAVDESS dataset](https://zenodo.org/record/1188976#.YO6yI-gzaUk). This dataset provides 1440 samples of recordings from actors performing on 8 different emotions in English, which are:
@@ -72,6 +76,10 @@ The following hyperparameters were used during training:
 | 0.4581        | 2.72  | 390  | 0.4719          | 0.8467   |
 | 0.3967        | 2.93  | 420  | 0.5023          | 0.8223   |
 ### Framework versions

 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# Speech Emotion Recognition By Fine-Tuning Wav2Vec 2.0
+__Important:__ This model it's not yet implementable due to missing built-in functions in HuggingFace for speech classification tasks. I'm working on the instructions of how to use it and a repository where the code will be available soon. Thank you anyway!
+The model is a fine-tuned version of [jonatasgrosman/wav2vec2-large-xlsr-53-english](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-english) for a Speech Emotion Recognition (SER) task.
 The dataset used to fine-tune the original pre-trained model is the [RAVDESS dataset](https://zenodo.org/record/1188976#.YO6yI-gzaUk). This dataset provides 1440 samples of recordings from actors performing on 8 different emotions in English, which are:
 | 0.4581        | 2.72  | 390  | 0.4719          | 0.8467   |
 | 0.3967        | 2.93  | 420  | 0.5023          | 0.8223   |
+## Contact
+Any doubt, contact me on [Twitter](https://twitter.com/ehcalabres) (GitHub repo soon).
 ### Framework versions