MultiMediate 2023: Engagement level detection using audio and video features

Chunxi YANG, Kangzhong WANG, Peter Q. CHEN, MK Michael CHEUNG, Youqian ZHANG, Yujun Eugene FU, Grace NGAI

Research output: Chapter in Book/Report/Conference proceedingChapters

7 Citations (Scopus)

Abstract

Real-time engagement estimation holds significant potential across various research areas, particularly in the realm of human-computer interaction. It empowers artificial agents to dynamically adjust their responses based on user engagement levels, fostering more intuitive and immersive interactions. Despite the strides in automating real-time engagement estimation, the task remains challenging in real-world settings, especially when handling multi-modal human social signals. Capitalizing on human body and audio signals, this paper explores the appropriate feature representations of different modalities and effective modelling of dual conversations. This results in a novel and efficient multi-modal engagement detection model.We thoroughly evaluated our method in the MultiMediate'23 grand challenge. It performs consistently, with a notable improvement over the baseline model. Specifically, while the baseline achieves a concordance correlation coefficient (CCC) of 0.59, our approach yields a CCC of 0.70, suggesting its promising efficacy in real-life engagement detection. Copyright © 2023 held by the owner/author(s).

Original languageEnglish
Title of host publicationProceedings of the 31st ACM International Conference on Multimedia, MM '23
Place of PublicationUSA
PublisherAssociation for Computing Machinery
Pages9601-9605
ISBN (Electronic)9798400701085
DOIs
Publication statusPublished - 2023

Citation

Yang, C., Wang, K., Chen, P. Q., Cheung, M. K. M., Zhang, Y., Fu, E. Y., & Ngai, G. (2023). MultiMediate 2023: Engagement level detection using audio and video features. In Proceedings of the 31st ACM International Conference on Multimedia, MM '23 (pp. 9601-9605). Association for Computing Machinery. https://doi.org/10.1145/3581783.3612873

Keywords

  • Engagement
  • Machine learning
  • Neural networks

Fingerprint

Dive into the research topics of 'MultiMediate 2023: Engagement level detection using audio and video features'. Together they form a unique fingerprint.