amiriparian
/

ExHuBERT

Audio Classification

Speech Emotion Recognition

Affective Computing

Model card Files Files and versions Community

amiriparian commited on Jun 4

Commit

2738a8c

•

1 Parent(s): d27d836

Update README.md

Files changed (1) hide show

README.md +80 -0

README.md CHANGED Viewed

@@ -1,3 +1,83 @@
 ---
 license: cc-by-nc-sa-4.0
 ---

 ---
 license: cc-by-nc-sa-4.0
+language:
+- en
+- de
+- zh
+- fr
+- nl
+- el
+- it
+library_name: transformers
+pipeline_tag: audio-classification
+tags:
+- HuBERT
+- Speech Emotion Recognition
+- SER
+- PyTorch
 ---
+# **ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets**
+Authors: Shahin Amiriparian, Filip Packań, Maurice Gerczuk, Björn W. Schuller
+Fine-tuned [**HuBERT Large**](https://huggingface.co/facebook/hubert-large-ls960-ft) on EmoSet++, comprising 37 datasets, totaling 150,907 samples and spanning a cumulative duration of 119.5 hours.
+The model is expecting a 3 second long raw waveform resampled to 16 kHz. The original 6 Ouput classes are combinations of low/high arousal and negative/neutral/positive
+valence.
+Further details are available in the corresponding [**paper**](https://arxiv.org/)
+**Note**: This model is for research purpose only.
+### EmoSet++ subsets used for fine-tuning the model:
+|     |    |     |    |     |
+| :---:   | :---: | :---: | :---: | :---: |
+| ABC | AD    | BES    | CASIA   | CVE    |
+| Crema-D | DES   | DEMoS   | EA-ACT   | EA-BMW   |
+| EA-WSJ | EMO-DB    | EmoFilm    | EmotiW-2014   | EMOVO    |
+| eNTERFACE | ESD    | EU-EmoSS    | EU-EV   | FAU Aibo    |
+| GEMEP | GVESS    | IEMOCAP    | MES   |   MESD  |
+| MELD |   PPMMK  |  RAVDESS   |  SAVEE  |   ShEMO  |
+| SmartKom |   SIMIS  |  SUSAS   |  SUBSECO  |   TESS  |
+| TurkishEmo |  Urdu   |     |    |     |
+### Usage
+```python
+import torch
+import torch.nn as nn
+from transformers import HubertForSequenceClassification, Wav2Vec2FeatureExtractor
+# CONFIG and MODEL SETUP
+model_name = '.../HuBERT-EmoSet++'
+feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("facebook/hubert-base-ls960")
+model = HubertForSequenceClassification.from_pretrained(model_name)
+model.classifier = nn.Linear(in_features=256,out_features=6)
+sampling_rate=16000
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model = model.to(device)
+```
+### Citation Info
+```
+@inproceedings{Amiriparian24-EEH,
+  author = {Shahin Amiriparian and Filip Packan and Maurice Gerczuk and Bj\"orn W.\ Schuller},
+  title = {{ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets}},
+  booktitle = {{Proc. INTERSPEECH}},
+  year = {2024},
+  editor = {},
+  volume = {},
+  series = {},
+  address = {Kos Island, Greece},
+  month = {September},
+  publisher = {ISCA},
+}
+```