Abstract
The rapid evolution of affective computing demands sophisticated methodologies to enhance the reliability and effectiveness of speech emotion recognition (SER). This study integrates harmonic-percussive component analysis (HPCA) with variational mode decomposition (VMD) to overcome various drawbacks for conventional speech emotion recognition (SER) methodologies that primarily rely on stand-alone feature extraction techniques. This implementation refines acoustic feature extraction and optimizes VMD decomposition to prevent information loss from mode duplication and mixing problems. We propose a feature map generator that channels the enhanced feature vectors into a convolutional neural network, specifically the VGG16 model, and the model is further enriched by incorporating diverse acoustic features including HP and log Mel spectro-grams into two-dimensional spaces to intensify data augmentation and enrich emotional feature representation. Extensive testing on Berlin EMO-DB and RAVDESS databases confirmed positive impacts for the proposed HP-VMD model performance, achieving robust classification accuracy of 96.67%. Thus, the proposed integrated approach to developing SER systems significantly enhances empathetic human computer interactions. Copyright © 2024 IEEE.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2024 IEEE International Conference on Behavioural and Social Computing (BESC-2024) |
Place of Publication | USA |
Publisher | IEEE |
ISBN (Electronic) | 9798331531904 |
DOIs | |
Publication status | Published - 2024 |