Gradient Shaping: Enhancing Backdoor Attack Against Reverse Engineering

01/29/2023
by   Rui Zhu, et al.
0

Most existing methods to detect backdoored machine learning (ML) models take one of the two approaches: trigger inversion (aka. reverse engineer) and weight analysis (aka. model diagnosis). In particular, the gradient-based trigger inversion is considered to be among the most effective backdoor detection techniques, as evidenced by the TrojAI competition, Trojan Detection Challenge and backdoorBench. However, little has been done to understand why this technique works so well and, more importantly, whether it raises the bar to the backdoor attack. In this paper, we report the first attempt to answer this question by analyzing the change rate of the backdoored model around its trigger-carrying inputs. Our study shows that existing attacks tend to inject the backdoor characterized by a low change rate around trigger-carrying inputs, which are easy to capture by gradient-based trigger inversion. In the meantime, we found that the low change rate is not necessary for a backdoor attack to succeed: we design a new attack enhancement called Gradient Shaping (GRASP), which follows the opposite direction of adversarial training to reduce the change rate of a backdoored model with regard to the trigger, without undermining its backdoor effect. Also, we provide a theoretic analysis to explain the effectiveness of this new technique and the fundamental weakness of gradient-based trigger inversion. Finally, we perform both theoretical and experimental analysis, showing that the GRASP enhancement does not reduce the effectiveness of the stealthy attacks against the backdoor detection methods based on weight analysis, as well as other backdoor mitigation methods without using detection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2022

Beyond Gradients: Exploiting Adversarial Priors in Model Inversion Attacks

Collaborative machine learning settings like federated learning can be s...
research
03/19/2021

Attribution of Gradient Based Adversarial Attacks for Reverse Engineering of Deceptions

Machine Learning (ML) algorithms are susceptible to adversarial attacks ...
research
08/05/2018

Adversarial Examples: Attacks on Machine Learning-based Malware Visualization Detection Methods

As the threat of malicious software (malware) becomes urgently serious, ...
research
06/15/2019

Robust or Private? Adversarial Training Makes Models More Vulnerable to Privacy Attacks

Adversarial training was introduced as a way to improve the robustness o...
research
06/05/2019

Enhancing Gradient-based Attacks with Symbolic Intervals

Recent breakthroughs in defenses against adversarial examples, like adve...
research
04/05/2023

UNICORN: A Unified Backdoor Trigger Inversion Framework

The backdoor attack, where the adversary uses inputs stamped with trigge...
research
06/07/2023

A Linearly Convergent GAN Inversion-based Algorithm for Reverse Engineering of Deceptions

An important aspect of developing reliable deep learning systems is devi...

Please sign up or login with your details

Forgot password? Click here to reset