TensorFlow implementation of "Attentive Modality Hopping for Speech Emotion Recognition," ICASSP-20
This repository contains the source code used in the following paper,
Attentive Modality Hopping Mechanism for Speech Emotion Recognition, [paper]
I recently found that I use the “precision” metric for the model evaluation.
When I change the metric from “precision” to “accuracy,” models show similar performance for the “weighted” case. However, models show lower performance for the “unweighted” case.
This behavior is similarly observed for other models (MHA, MDRE).
I already revised the source code. You can change the metric at the “project_config.py.”
USE_PRECISION = True --> "precision" metric
USE_PRECISION = False --> "accuracy" metric
Precision (previously misreported as accuracy)
Model | Modality | Weighted | Unweighted |
---|---|---|---|
MDRE[9] | A+T | 0.557 ± 0.018 | 0.536 ± 0.030 |
MDRE[9] | T+V | 0.585 ± 0.040 | 0.561 ± 0.046 |
MDRE[9] | A+V | 0.481 ± 0.049 | 0.415 ± 0.047 |
MHA[12] | A+T | 0.583 ± 0.025 | 0.555 ± 0.040 |
MHA[12] | T+V | 0.590 ± 0.017 | 0.560 ± 0.032 |
MHA[12] | A+V | 0.490 ± 0.049 | 0.434 ± 0.060 |
MDRE[9] | A+T+V | 0.602 ± 0.033 | 0.575 ± 0.046 |
AMH(ours) | A+T+V | 0.624 ± 0.022 | 0.597 ± 0.040 |
Accuracy (revised results)
Model | Modality | Weighted | Unweighted |
---|---|---|---|
MDRE[9] | A+T | 0.498 ± 0.059 | 0.418 ± 0.077 |
MDRE[9] | T+V | 0.579 ± 0.015 | 0.524 ± 0.021 |
MDRE[9] | A+V | 0.477 ± 0.025 | 0.376 ± 0.024 |
MHA[12] | A+T | 0.543 ± 0.026 | 0.491 ± 0.028 |
MHA[12] | T+V | 0.580 ± 0.019 | 0.526 ± 0.024 |
MHA[12] | A+V | 0.471 ± 0.047 | 0.371 ± 0.042 |
MDRE[9] | A+T+V | 0.564 ± 0.043 | 0.490 ± 0.056 |
AMH(ours) | A+T+V | 0.617 ± 0.016 | 0.547 ± 0.025 |
tensorflow==1.14 (tested on cuda-10.1, cudnn-7.6)
python==3.7
scikit-learn>=0.20.0
nltk>=3.3
If you want to download the “preprocessed dataset,” please ask the license to the IEMOCAP team first.
Audio: [#samples, 1000, 120] - (#sampels, sequencs(max 10s), dims)
Text (index) : [#samples, 128] - (#sampels, sequencs(max))
Video: [#samples, 32, 2048] - (#sampels, sequencs (max 10.6s), dims)
Emotion Classes :
| class | #samples |
|:—————:|—————:|
| angry | 1,103 |
| excited | 1,041 |
| happy | 595 |
| sad | 1,084 |
| frustrated | 1,849 |
| surprise | 107 |
| neutral | 1,708 |
If you want to use the same processed-data of our experiments, please drop us an email with the IEMOCAP license.
Attentive Modality Hopping (AMH)
@inproceedings{yoon2020attentive,
title={Attentive modality hopping mechanism for speech emotion recognition},
author={Yoon, Seunghyun and Dey, Subhadeep and Lee, Hwanhee and Jung, Kyomin},
booktitle={ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={3362—3366},
year={2020},
organization={IEEE}
}