Abstract
Deep learning-based speech enhancement methods make use of their non-linearity properties to estimate the speech and noise signals, especially the non-stationary noise. DCCRN, in particular, achieves state-of-the-art performance on speech intelligibility. However, the non-linear property also causes concern about the robustness of the method. Novel and unexpected noises can be generated if the noisy input speech is beyond the operation condition of the method. In this paper, we propose a hybrid framework called LDCCRN, which integrates a traditional speech enhancement method LogMMSE-EM and DCCRN. The proposed framework leverages the strength of both approaches to improve the robustness in speech enhancement. While the DCCRN continues to remove the non-stationary noise in the speech, the novel noises generated by DCCRN, if any, are effectively suppressed by LogMMSE-EM. As shown in our experimental results, the proposed method achieves better performance over the traditional approaches measured with standard evaluation methods. Copyright © 2022 Society of Photo-Optical Instrumentation Engineers (SPIE).
Original language | English |
---|---|
Title of host publication | Proceedings of International Workshop on Advanced Imaging Technology (IWAIT) 2022 |
Publisher | SPIE |
ISBN (Electronic) | 9781510653313 |
DOIs | |
Publication status | Published - Apr 2022 |