Leveraged Mel spectrograms using harmonic and percussive components in speech emotion recognition

David Hason RUDD, Huan HUO, Guandong XU

Research output: Chapter in Book/Report/Conference proceedingChapters

7 Citations (Scopus)

Abstract

Speech Emotion Recognition (SER) affective technology enables the intelligent embedded devices to interact with sensitivity. Similarly, call centre employees recognise customers’ emotions from their pitch, energy, and tone of voice so as to modify their speech for a high-quality interaction with customers. This work explores, for the first time, the effects of the harmonic and percussive components of Mel spectrograms in SER. We attempt to leverage the Mel spectrogram by decomposing distinguishable acoustic features for exploitation in our proposed architecture, which includes a novel feature map generator algorithm, a CNN-based network feature extractor and a multi-layer perceptron (MLP) classifier. This study specifically focuses on effective data augmentation techniques for building an enriched hybrid-based feature map. This process results in a function that outputs a 2D image so that it can be used as input data for a pre-trained CNN-VGG16 feature extractor. Furthermore, we also investigate other acoustic features such as MFCCs, chromagram, spectral contrast, and the tonnetz to assess our proposed framework. A test accuracy of 92.79% on the Berlin EMO-DB database is achieved. Our result is higher than previous works using CNN-VGG16. Copyright © 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining: 26th Pacific-Asia Conference, PAKDD 2022, proceedings. part I
EditorsJoão GAMA, Tianrui LI, Yang YU, Enhong CHEN, Yu ZHENG, Fei TENG
PublisherSpringer
Pages392-404
ISBN (Electronic)9783031059360
ISBN (Print)9783031059353
DOIs
Publication statusPublished - 2022

Citation

Rudd, D. H., Huo, H., & Xu, G. (2022). Leveraged Mel spectrograms using harmonic and percussive components in speech emotion recognition. In J. Gama, T. Li, Y. Yu, E. Chen, Y. Zheng, & F. Teng (Eds.), Advances in Knowledge Discovery and Data Mining: 26th Pacific-Asia Conference, PAKDD 2022, proceedings. part I (pp. 392-404). Springer. https://doi.org/10.1007/978-3-031-05936-0_31

Fingerprint

Dive into the research topics of 'Leveraged Mel spectrograms using harmonic and percussive components in speech emotion recognition'. Together they form a unique fingerprint.