Poison Attack and Defense on Deep Source Code Processing Models

by   Jia Li, et al.

In the software engineering community, deep learning (DL) has recently been applied to many source code processing tasks. Due to the poor interpretability of DL models, their security vulnerabilities require scrutiny. Recently, researchers have identified an emergent security threat, namely poison attack. The attackers aim to inject insidious backdoors into models by poisoning the training data with poison samples. Poisoned models work normally with clean inputs but produce targeted erroneous results with poisoned inputs embedded with triggers. By activating backdoors, attackers can manipulate the poisoned models in security-related scenarios. To verify the vulnerability of existing deep source code processing models to the poison attack, we present a poison attack framework for source code named CodePoisoner as a strong imaginary enemy. CodePoisoner can produce compilable even human-imperceptible poison samples and attack models by poisoning the training data with poison samples. To defend against the poison attack, we further propose an effective defense approach named CodeDetector to detect poison samples in the training data. CodeDetector can be applied to many model architectures and effectively defend against multiple poison attack approaches. We apply our CodePoisoner and CodeDetector to three tasks, including defect detection, clone detection, and code repair. The results show that (1) CodePoisoner achieves a high attack success rate (max: 100 models to targeted erroneous behaviors. It validates that existing deep source code processing models have a strong vulnerability to the poison attack. (2) CodeDetector effectively defends against multiple poison attack approaches by detecting (max: 100 can help practitioners notice the poison attack and inspire the design of more advanced defense techniques.


page 1

page 2

page 3

page 4


How Effective Are Neural Networks for Fixing Security Vulnerabilities

Security vulnerability repair is a difficult task that is in dire need o...

Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges

Deep Learning (DL) techniques for Natural Language Processing have been ...

Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks

In this work, we assess the security of AI code generators via data pois...

Helix++: A platform for efficiently securing software

The open-source Helix++ project improves the security posture of computi...

FMT: Removing Backdoor Feature Maps via Feature Map Testing in Deep Neural Networks

Deep neural networks have been widely used in many critical applications...

Trojan Source: Invisible Vulnerabilities

We present a new type of attack in which source code is maliciously enco...

Backdoors in Neural Models of Source Code

Deep neural networks are vulnerable to a range of adversaries. A particu...

Please sign up or login with your details

Forgot password? Click here to reset