Multi-modal attention network learning for semantic source code retrieval

Yao WAN, Jingdong SHU, Yulei SUI, Guandong XU, Zhou ZHAO, Jian WU, Philip YU

Research output: Chapter in Book/Report/Conference proceedingChapters

125 Citations (Scopus)

Abstract

Code retrieval techniques and tools have been playing a key role in facilitating software developers to retrieve existing code fragments from available open-source repositories given a user query (e.g., a short natural language text describing the functionality for retrieving a particular code snippet). Despite the existing efforts in improving the effectiveness of code retrieval, there are still two main issues hindering them from being used to accurately retrieve satisfiable code fragments from large-scale repositories when answering complicated queries. First, the existing approaches only consider shallow features of source code such as method names and code tokens, but ignoring structured features such as abstract syntax trees (ASTs) and control-flow graphs (CFGs) of source code, which contains rich and well-defined semantics of source code. Second, although the deep learning-based approach performs well on the representation of source code, it lacks the explainability, making it hard to interpret the retrieval results and almost impossible to understand which features of source code contribute more to the final results. To tackle the two aforementioned issues, this paper proposes MMAN, a novel Multi-Modal Attention Network for semantic source code retrieval. A comprehensive multi-modal representation is developed for representing unstructured and structured features of source code, with one LSTM for the sequential tokens of code, a Tree-LSTM for the AST of code and a GGNN (Gated Graph Neural Network) for the CFG of code. Furthermore, a multi-modal attention fusion layer is applied to assign weights to different parts of each modality of source code and then integrate them into a single hybrid representation. Comprehensive experiments and analysis on a large-scale real-world dataset show that our proposed model can accurately retrieve code snippets and outperforms the state-of-the-art methods. Copyright © 2019 IEEE.

Original languageEnglish
Title of host publicationProceedings of 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019
Place of PublicationUSA
PublisherIEEE
Pages13-25
ISBN (Electronic)9781728125084
DOIs
Publication statusPublished - 2019

Citation

Wan, Y., Shu, J., Sui, Y., Xu, G., Zhao, Z., Wu, J., & Yu, P. (2019). Multi-modal attention network learning for semantic source code retrieval. In Proceedings of 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019 (pp. 13-25). IEEE. https://doi.org/10.1109/ASE.2019.00012

Keywords

  • Code retrieval
  • Multi-modal network
  • Attention mechanism
  • Deep learning

Fingerprint

Dive into the research topics of 'Multi-modal attention network learning for semantic source code retrieval'. Together they form a unique fingerprint.