Unveiling subtle cues: Backchannel detection using temporal multimodal attention networks

Kangzhong WANG, MK Michael CHEUNG, Youqian ZHANG, Chunxi YANG, Peter Q. CHEN, Yujun Eugene FU, Grace NGAI

Research output: Chapter in Book/Report/Conference proceedingChapters

1 Citation (Scopus)

Abstract

Automatic detection of backchannel has great potential to enhance artificial mediators, which indicate listeners' attention and agreement in human communication. It is often expressed by subtle non-verbal cues that occur briefly and sparsely. Focusing on identifying and locating these subtle cues (i.e., their occurrence moment and the involved body parts), this paper proposes a novel approach for backchannel detection. In particular, our model utilizes temporal- and modality-attention modules to determine and lead the model to pay more attention to both the indicative moment and the accompanying body parts at that specific time. It achieves an accuracy of 68.6% on the testing set in MultiMediate'23 backchannel detection challenge, outperforming the counterparts. Furthermore, we conducted an ablation study to thoroughly understand the contributions of our model. This study underscores the effectiveness of our selection of modality inputs and the importance of the two attention modules in our model. Copyright © 2023 held by the owner/author(s).

Original languageEnglish
Title of host publicationProceedings of the 31st ACM International Conference on Multimedia, MM '23
Place of PublicationUSA
PublisherAssociation for Computing Machinery
Pages9586-9590
ISBN (Electronic)9798400701085
DOIs
Publication statusPublished - 2023

Citation

Wang, K., Cheung, M. K. M., Zhang, Y., Yang, C., Chen, P. Q., Fu, E. Y., & Ngai, G. (2023). Unveiling subtle cues: Backchannel detection using temporal multimodal attention networks. In Proceedings of the 31st ACM International Conference on Multimedia, MM '23 (pp. 9586-9590). Association for Computing Machinery. https://doi.org/10.1145/3581783.3612870

Keywords

  • Backchannel detection
  • Attention models
  • Visual cues

Fingerprint

Dive into the research topics of 'Unveiling subtle cues: Backchannel detection using temporal multimodal attention networks'. Together they form a unique fingerprint.