DySarl: Dynamic structure-aware representation learning for multimodal knowledge graph reasoning

Kangzheng LIU, Feng ZHAO, Yu YANG, Guandong XU

Research output: Chapter in Book/Report/Conference proceedingChapters

Abstract

Multimodal knowledge graph (MKG) reasoning has attracted significant attention since impressive performance has been achieved by adding multimodal auxiliary information (i.e., texts and images) to the entities of traditional KGs. However, existing studies heavily rely on path-based methods for learning structural modality, failing to capture the complex structural interactions among multimodal entities beyond the reasoning path. In addition, existing studies have largely ignored the dynamic impact of different multimodal features on different decision facts for reasoning, which utilize asymmetric coattention to independently learn the static interplay between different modalities without dynamically joining the reasoning process. We propose a novel Dynamic Structure-aware representation learning method, namely DySarl, to overcome this problem and significantly improve the MKG reasoning performance. Specifically, we devise a dual-space multihop structural learning module in DySarl, aggregating the multihop structural features of multimodal entities via a novel message-passing mechanism. It integrates the message paradigms in Euclidean and hyperbolic spaces, effectively preserving the neighborhood information beyond the limited multimodal query paths. Furthermore, DySarl has an interactive symmetric attention module to explicitly learn the dynamic impacts of unimodal attention senders and multimodal attention targets on decision facts through a newly designed symmetric attention component and fact-specific gated attention unit, equipping DySarl with the dynamic associations between the multimodal feature learning and later reasoning. Extensive experiments show that DySarl achieves significantly improved reasoning performance on two public MKG datasets compared with that of the state-of-the-art baselines. Source codes are available at https://github.com/HUSTNLP-codes/DySarl. Copyright © 2024 by the owner/author(s).

Original languageEnglish
Title of host publicationProceedings of the 32nd ACM International Conference on Multimedia, MM 2024
Place of PublicationNew York, United States
PublisherAssociation for Computing Machinery
Pages8247-8256
ISBN (Electronic)9798400706868
DOIs
Publication statusPublished - Oct 2024

Citation

Liu, K., Zhao, F., Yang, Y., & Xu, G. (2024). DySarl: Dynamic structure-aware representation learning for multimodal knowledge graph reasoning. In Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024 (pp. 8247-8256). Association for Computing Machinery. https://doi.org/10.1145/3664647.3681020

Keywords

  • Multimodal knowledge graph
  • Graph convolutional network
  • Crossmodal fusion

Fingerprint

Dive into the research topics of 'DySarl: Dynamic structure-aware representation learning for multimodal knowledge graph reasoning'. Together they form a unique fingerprint.