You see what I want you to see: Poisoning vulnerabilities in neural code search

Yao WAN, Shijie ZHANG, Hongyu ZHANG, Yulei SUI, Guandong XU, Dezhong YAO, Hai JIN, Lichao SUN

Research output: Chapter in Book/Report/Conference proceedingChapters

9 Citations (Scopus)

Abstract

Searching and reusing code snippets from open-source software repositories based on natural-language queries can greatly improve programming productivity.Recently, deep-learning-based approaches have become increasingly popular for code search. Despite substantial progress in training accurate models of code search, the robustness of these models has received little attention so far. In this paper, we aim to study and understand the security and robustness of code search models by answering the following question: Can we inject backdoors into deep-learning-based code search models? If so, can we detect poisoned data and remove these backdoors? This work studies and develops a series of backdoor attacks on the deep-learning-based models for code search, through data poisoning. We first show that existing models are vulnerable to data-poisoning-based backdoor attacks. We then introduce a simple yet effective attack on neural code search models by poisoning their corresponding training dataset. 

Moreover, we demonstrate that attacks can also influence the ranking of the code search results by adding a few specially-crafted source code files to the training corpus. We show that this type of backdoor attack is effective for several representative deep-learning-based code search systems, and can successfully manipulate the ranking list of searching results. Taking the bidirectional RNN-based code search system as an example, the normalized ranking of the target candidate can be significantly raised from top 50% to top 4.43%, given a query containing an attacker targeted word, e.g., file. To defend a model against such attack, we empirically examine an existing popular defense strategy and evaluate its performance. Our results show the explored defense strategy is not yet effective in our proposed backdoor attack for code search systems. Copyright © 2022 Association for Computing Machinery.

Original languageEnglish
Title of host publicationProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
Place of PublicationNew York
PublisherAssociation for Computing Machinery
Pages1233-1245
ISBN (Electronic)9781450394130
DOIs
Publication statusPublished - Nov 2022

Citation

Wan, Y., Zhang, S., Zhang, H., Sui, Y., Xu, G., Yao, D., Jin, H., & Sun, L. (2022). You see what I want you to see: Poisoning vulnerabilities in neural code search. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (pp. 1233-1245). Association for Computing Machinery. https://doi.org/10.1145/3540250.3549153

Keywords

  • Code search
  • Software vulnerability
  • Deep learning
  • Backdoor attack
  • Data poisoning

Fingerprint

Dive into the research topics of 'You see what I want you to see: Poisoning vulnerabilities in neural code search'. Together they form a unique fingerprint.