Abstract
The classification of environmental sound events is of great significance for applications such as machine hearing and acoustic surveillance. Feature representation and feature vector dimension directly affect system performance. To better extract features and reduce computational burden, a novel frequency-energy feature representation and two-stage dimension reduction system were proposed. First, a frequency-energy diagram is generated. Based on this, the importance screening is done and only the energy bins of high importance are retained, which reduces the dimension of feature vector while extracting key information. Then the Bicubic interpolation method is used to further reduce the dimension. And the appropriate feature vector dimension is determined based on the change of information entropy. The proposed frequency-energy feature representation and two-stage dimension reduction system are evaluated with Real Word Computing Partnership sound scene database (RWCP-SSD), UrbanSound8K, and ESC-50 datasets, which demonstrate that the robustness is satisfactory under low signal-to-noise ratios (SNRs) and 15 noise types from NOISEX-92 database. Copyright © 2023 IEEE.
Original language | English |
---|---|
Pages (from-to) | 1290-1304 |
Journal | IEEE/ACM Transactions on Audio Speech and Language Processing |
Volume | 31 |
Early online date | Mar 2023 |
DOIs | |
Publication status | Published - 2023 |