Abstract
Multimodal knowledge graph (MKG) reasoning has attracted significant attention since impressive performance has been achieved by adding multimodal auxiliary information (i.e., texts and images) to the entities of traditional KGs. However, existing studies heavily rely on path-based methods for learning structural modality, failing to capture the complex structural interactions among multimodal entities beyond the reasoning path. In addition, existing studies have largely ignored the dynamic impact of different multimodal features on different decision facts for reasoning, which utilize asymmetric coattention to independently learn the static interplay between different modalities without dynamically joining the reasoning process. We propose a novel Dynamic Structure-aware representation learning method, namely DySarl, to overcome this problem and significantly improve the MKG reasoning performance. Specifically, we devise a dual-space multihop structural learning module in DySarl, aggregating the multihop structural features of multimodal entities via a novel message-passing mechanism. It integrates the message paradigms in Euclidean and hyperbolic spaces, effectively preserving the neighborhood information beyond the limited multimodal query paths. Furthermore, DySarl has an interactive symmetric attention module to explicitly learn the dynamic impacts of unimodal attention senders and multimodal attention targets on decision facts through a newly designed symmetric attention component and fact-specific gated attention unit, equipping DySarl with the dynamic associations between the multimodal feature learning and later reasoning. Extensive experiments show that DySarl achieves significantly improved reasoning performance on two public MKG datasets compared with that of the state-of-the-art baselines. Source codes are available at https://github.com/HUSTNLP-codes/DySarl. Copyright © 2024 by the owner/author(s).
Original language | English |
---|---|
Title of host publication | Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024 |
Place of Publication | New York, United States |
Publisher | Association for Computing Machinery |
Pages | 8247-8256 |
ISBN (Electronic) | 9798400706868 |
DOIs | |
Publication status | Published - Oct 2024 |
Citation
Liu, K., Zhao, F., Yang, Y., & Xu, G. (2024). DySarl: Dynamic structure-aware representation learning for multimodal knowledge graph reasoning. In Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024 (pp. 8247-8256). Association for Computing Machinery. https://doi.org/10.1145/3664647.3681020Keywords
- Multimodal knowledge graph
- Graph convolutional network
- Crossmodal fusion