holylovenia's picture
Update README.md
b98df2f
|
raw
history blame
1.91 kB
---
license: cc-by-sa-4.0
language:
- en
- zh
metrics:
- f1
library_name: transformers
pipeline_tag: audio-classification
tags:
- speech-emotion-recognition
---
# Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition
Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on English and Chinese data from elderly speakers.
The model is trained on the training sets of [CREMA-D](https://github.com/CheyneyComputerScience/CREMA-D), [CSED](https://github.com/AkishinoShiame/Chinese-Speech-Emotion-Datasets), [ElderReact](https://github.com/Mayer123/ElderReact), and [TESS](https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess).
When using this model, make sure that your speech input is sampled at 16kHz.
The scripts used for training and evaluation can be found here:
[https://github.com/HLTCHKUST/elderly_ser/tree/main](https://github.com/HLTCHKUST/elderly_ser/tree/main)
## Evaluation Results
For the details (e.g., the statistics of `train`, `valid`, and `test` data), please refer to our paper on [arXiv](https://arxiv.org/abs/2306.14517).
It also provides the model's speech emotion recognition performances on: English-All, Chinese-All, English-Elderly, Chinese-Elderly, English-Adults, Chinese-Adults.
## Citation
Our paper will be published at INTERSPEECH 2023. In the meantime, you can find our paper on [arXiv](https://arxiv.org/abs/2306.14517).
If you find our work useful, please consider citing our paper as follows:
```
@misc{cahyawijaya2023crosslingual,
title={Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition},
author={Samuel Cahyawijaya and Holy Lovenia and Willy Chung and Rita Frieske and Zihan Liu and Pascale Fung},
year={2023},
eprint={2306.14517},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```