Abstract
Automatic detection of backchannel has great potential to enhance artificial mediators, which indicate listeners' attention and agreement in human communication. It is often expressed by subtle non-verbal cues that occur briefly and sparsely. Focusing on identifying and locating these subtle cues (i.e., their occurrence moment and the involved body parts), this paper proposes a novel approach for backchannel detection. In particular, our model utilizes temporal- and modality-attention modules to determine and lead the model to pay more attention to both the indicative moment and the accompanying body parts at that specific time. It achieves an accuracy of 68.6% on the testing set in MultiMediate'23 backchannel detection challenge, outperforming the counterparts. Furthermore, we conducted an ablation study to thoroughly understand the contributions of our model. This study underscores the effectiveness of our selection of modality inputs and the importance of the two attention modules in our model. Copyright © 2023 held by the owner/author(s).
Original language | English |
---|---|
Title of host publication | Proceedings of the 31st ACM International Conference on Multimedia, MM '23 |
Place of Publication | USA |
Publisher | Association for Computing Machinery |
Pages | 9586-9590 |
ISBN (Electronic) | 9798400701085 |
DOIs | |
Publication status | Published - 2023 |
Citation
Wang, K., Cheung, M. K. M., Zhang, Y., Yang, C., Chen, P. Q., Fu, E. Y., & Ngai, G. (2023). Unveiling subtle cues: Backchannel detection using temporal multimodal attention networks. In Proceedings of the 31st ACM International Conference on Multimedia, MM '23 (pp. 9586-9590). Association for Computing Machinery. https://doi.org/10.1145/3581783.3612870Keywords
- Backchannel detection
- Attention models
- Visual cues