Emotion recognition is a crucial application in human–computer interaction. It is usually conducted using facial expressions as the main modality, which might not be reliable. In this study, we proposed a multimodal approach that uses 2-channel electroencephalography (EEG) signals and eye modality in addition to the face modality to enhance the recognition performance. We also studied the use of facial images versus facial depth as the face modality and adapted the common arousal–valence model of emotions and the convolutional neural network, which can model the spatiotemporal information from the modality data for emotion recognition. Extensive experiments were conducted on the modality and emotion data, the results of which showed that our system has high accuracies of 67.8% and 77.0% in valence recognition and arousal recognition, respectively. The proposed method outperformed most state-of-the-art systems that use similar but fewer modalities. Moreover, the use of facial depth has outperformed the use of facial images. The proposed method of emotion recognition has significant potential for integration into various educational applications. Copyright © 2021 Elsevier B.V. All rights reserved.
CitationNgai, W. K., Xie, H., Zou, D., & Chou, K.-L. (2022). Emotion recognition based on convolutional neural networks and heterogeneous bio-signal data sources. Information Fusion, 77, 107-117. doi: 10.1016/j.inffus.2021.07.007
- Emotion recognition
- Arousal–valence model of emotions
- 3D convolutional neural network