Speech emotion recognition using Mel spectrogram HPCA and variational mode decomposition

David Hason RUDD, Xingyi GAO, Md Rafiqul ISLAM, Huan HUO, Guandong XU

Research output: Chapter in Book/Report/Conference proceedingChapters

Abstract

The rapid evolution of affective computing demands sophisticated methodologies to enhance the reliability and effectiveness of speech emotion recognition (SER). This study integrates harmonic-percussive component analysis (HPCA) with variational mode decomposition (VMD) to overcome various drawbacks for conventional speech emotion recognition (SER) methodologies that primarily rely on stand-alone feature extraction techniques. This implementation refines acoustic feature extraction and optimizes VMD decomposition to prevent information loss from mode duplication and mixing problems. We propose a feature map generator that channels the enhanced feature vectors into a convolutional neural network, specifically the VGG16 model, and the model is further enriched by incorporating diverse acoustic features including HP and log Mel spectro-grams into two-dimensional spaces to intensify data augmentation and enrich emotional feature representation. Extensive testing on Berlin EMO-DB and RAVDESS databases confirmed positive impacts for the proposed HP-VMD model performance, achieving robust classification accuracy of 96.67%. Thus, the proposed integrated approach to developing SER systems significantly enhances empathetic human computer interactions. Copyright © 2024 IEEE.

Original languageEnglish
Title of host publicationProceedings of the 2024 IEEE International Conference on Behavioural and Social Computing (BESC-2024)
Place of PublicationUSA
PublisherIEEE
ISBN (Electronic)9798331531904
DOIs
Publication statusPublished - 2024

Citation

Rudd, D. H., Gao, X., Islam, M. R., Huo, H., & Xu, G. (2024). Speech emotion recognition using Mel spectrogram HPCA and variational mode decomposition. In Proceedings of the 2024 IEEE International Conference on Behavioural and Social Computing (BESC-2024). IEEE. https://doi.org/10.1109/BESC64747.2024.10780726

Fingerprint

Dive into the research topics of 'Speech emotion recognition using Mel spectrogram HPCA and variational mode decomposition'. Together they form a unique fingerprint.