amiriparian
commited on
Commit
•
2738a8c
1
Parent(s):
d27d836
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,83 @@
|
|
1 |
---
|
2 |
license: cc-by-nc-sa-4.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: cc-by-nc-sa-4.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
- de
|
6 |
+
- zh
|
7 |
+
- fr
|
8 |
+
- nl
|
9 |
+
- el
|
10 |
+
- it
|
11 |
+
library_name: transformers
|
12 |
+
pipeline_tag: audio-classification
|
13 |
+
tags:
|
14 |
+
- HuBERT
|
15 |
+
- Speech Emotion Recognition
|
16 |
+
- SER
|
17 |
+
- PyTorch
|
18 |
---
|
19 |
+
|
20 |
+
# **ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets**
|
21 |
+
Authors: Shahin Amiriparian, Filip Packań, Maurice Gerczuk, Björn W. Schuller
|
22 |
+
|
23 |
+
Fine-tuned [**HuBERT Large**](https://huggingface.co/facebook/hubert-large-ls960-ft) on EmoSet++, comprising 37 datasets, totaling 150,907 samples and spanning a cumulative duration of 119.5 hours.
|
24 |
+
The model is expecting a 3 second long raw waveform resampled to 16 kHz. The original 6 Ouput classes are combinations of low/high arousal and negative/neutral/positive
|
25 |
+
valence.
|
26 |
+
Further details are available in the corresponding [**paper**](https://arxiv.org/)
|
27 |
+
|
28 |
+
**Note**: This model is for research purpose only.
|
29 |
+
|
30 |
+
### EmoSet++ subsets used for fine-tuning the model:
|
31 |
+
|
32 |
+
| | | | | |
|
33 |
+
| :---: | :---: | :---: | :---: | :---: |
|
34 |
+
| ABC | AD | BES | CASIA | CVE |
|
35 |
+
| Crema-D | DES | DEMoS | EA-ACT | EA-BMW |
|
36 |
+
| EA-WSJ | EMO-DB | EmoFilm | EmotiW-2014 | EMOVO |
|
37 |
+
| eNTERFACE | ESD | EU-EmoSS | EU-EV | FAU Aibo |
|
38 |
+
| GEMEP | GVESS | IEMOCAP | MES | MESD |
|
39 |
+
| MELD | PPMMK | RAVDESS | SAVEE | ShEMO |
|
40 |
+
| SmartKom | SIMIS | SUSAS | SUBSECO | TESS |
|
41 |
+
| TurkishEmo | Urdu | | | |
|
42 |
+
|
43 |
+
|
44 |
+
|
45 |
+
### Usage
|
46 |
+
|
47 |
+
```python
|
48 |
+
import torch
|
49 |
+
import torch.nn as nn
|
50 |
+
from transformers import HubertForSequenceClassification, Wav2Vec2FeatureExtractor
|
51 |
+
|
52 |
+
|
53 |
+
|
54 |
+
# CONFIG and MODEL SETUP
|
55 |
+
model_name = '.../HuBERT-EmoSet++'
|
56 |
+
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("facebook/hubert-base-ls960")
|
57 |
+
model = HubertForSequenceClassification.from_pretrained(model_name)
|
58 |
+
model.classifier = nn.Linear(in_features=256,out_features=6)
|
59 |
+
|
60 |
+
sampling_rate=16000
|
61 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
62 |
+
model = model.to(device)
|
63 |
+
|
64 |
+
|
65 |
+
```
|
66 |
+
|
67 |
+
### Citation Info
|
68 |
+
|
69 |
+
|
70 |
+
```
|
71 |
+
@inproceedings{Amiriparian24-EEH,
|
72 |
+
author = {Shahin Amiriparian and Filip Packan and Maurice Gerczuk and Bj\"orn W.\ Schuller},
|
73 |
+
title = {{ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets}},
|
74 |
+
booktitle = {{Proc. INTERSPEECH}},
|
75 |
+
year = {2024},
|
76 |
+
editor = {},
|
77 |
+
volume = {},
|
78 |
+
series = {},
|
79 |
+
address = {Kos Island, Greece},
|
80 |
+
month = {September},
|
81 |
+
publisher = {ISCA},
|
82 |
+
}
|
83 |
+
```
|