Invisible Backdoor Attacks Against Deep Neural Networks
Deep neural networks (DNNs) have been proven vulnerable to backdoor attacks, where hidden features (patterns) trained to a normal model, and only activated by some specific input (called triggers), trick the model into producing unexpected behavior. In this paper, we design an optimization framework to create covert and scattered triggers for backdoor attacks, invisible backdoors, where triggers can amplify the specific neuron activation, while being invisible to both backdoor detection methods and human inspection. We use the Perceptual Adversarial Similarity Score (PASS) rozsa2016adversarial to define invisibility for human users and apply L_2 and L_0 regularization into the optimization process to hide the trigger within the input data. We show that the proposed invisible backdoors can be fairly effective across various DNN models as well as three datasets CIFAR-10, CIFAR-100, and GTSRB, by measuring their attack success rates and invisibility scores.
READ FULL TEXT