metadata
frameworks:
- Pytorch
license: apache-2.0
tasks:
- emotion-recognition
tags:
- audio
- audio-classification
- speech
- speech-emotion-recognition
安装环境
- modelscope>=1.11.1
- funasr>=1.0.5
用法
基于modelscope进行推理
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
inference_pipeline = pipeline(
task=Tasks.emotion_recognition,
model="iic/emotion2vec_base_finetuned", model_revision="v2.0.4")
rec_result = inference_pipeline('https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav', granularity="utterance", extract_embedding=False)
print(rec_result)
基于FunASR进行推理
from funasr import AutoModel
model = AutoModel(model="iic/emotion2vec_base_finetuned", model_revision="v2.0.4")
wav_file = f"{model.model_path}/example/test.wav"
res = model.generate(wav_file, output_dir="./outputs", granularity="utterance", extract_embedding=False)
print(res)
注:模型会自动下载
支持输入文件列表,wav.scp(kaldi风格):
wav_name1 wav_path1.wav
wav_name2 wav_path2.wav
...
输出为情感表征向量,保存在output_dir
中,格式为numpy格式(可以用np.load()加载)
说明
本仓库为emotion2vec的modelscope版本,模型参数完全一致。
原始仓库地址: https://github.com/ddlBoJack/emotion2vec
modelscope版本仓库:https://github.com/alibaba-damo-academy/FunASR
相关论文以及引用信息
@article{ma2023emotion2vec,
title={emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation},
author={Ma, Ziyang and Zheng, Zhisheng and Ye, Jiaxin and Li, Jinchao and Gao, Zhifu and Zhang, Shiliang and Chen, Xie},
journal={arXiv preprint arXiv:2312.15185},
year={2023}
}