Backdooring Neural Code Search

05/27/2023
by   Weisong Sun, et al.
0

Reusing off-the-shelf code snippets from online repositories is a common practice, which significantly enhances the productivity of software developers. To find desired code snippets, developers resort to code search engines through natural language queries. Neural code search models are hence behind many such engines. These models are based on deep learning and gain substantial attention due to their impressive performance. However, the security aspect of these models is rarely studied. Particularly, an adversary can inject a backdoor in neural code search models, which return buggy or even vulnerable code with security/privacy issues. This may impact the downstream software (e.g., stock trading systems and autonomous driving) and cause financial loss and/or life-threatening incidents. In this paper, we demonstrate such attacks are feasible and can be quite stealthy. By simply modifying one variable/function name, the attacker can make buggy/vulnerable code rank in the top 11 attack BADCODE features a special trigger generation and injection procedure, making the attack more effective and stealthy. The evaluation is conducted on two neural code search models and the results show our attack outperforms baselines by 60 than the baseline by two times based on the F1 score.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2022

CodeDSI: Differentiable Code Search

Reimplementing solutions to previously solved software engineering probl...
research
04/06/2022

Code Search: A Survey of Techniques for Finding Code

The immense amounts of source code provide ample challenges and opportun...
research
05/09/2023

BadCS: A Backdoor Attack Framework for Code search

With the development of deep learning (DL), DL-based code search models ...
research
08/04/2023

Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks

In this work, we assess the security of AI code generators via data pois...
research
01/20/2022

VUDENC: Vulnerability Detection with Deep Learning on a Natural Codebase for Python

Context: Identifying potential vulnerable code is important to improve t...
research
10/30/2019

Secure Logging with Security against Adaptive Crash Attack

Logging systems are an essential component of security systems and their...
research
06/18/2021

Bad Characters: Imperceptible NLP Attacks

Several years of research have shown that machine-learning systems are v...

Please sign up or login with your details

Forgot password? Click here to reset