Differential Analysis of Triggers and Benign Features for Black-Box DNN Backdoor Detection

by   Hao Fu, et al.

This paper proposes a data-efficient detection method for deep neural networks against backdoor attacks under a black-box scenario. The proposed approach is motivated by the intuition that features corresponding to triggers have a higher influence in determining the backdoored network output than any other benign features. To quantitatively measure the effects of triggers and benign features on determining the backdoored network output, we introduce five metrics. To calculate the five-metric values for a given input, we first generate several synthetic samples by injecting the input's partial contents into clean validation samples. Then, the five metrics are computed by using the output labels of the corresponding synthetic samples. One contribution of this work is the use of a tiny clean validation dataset. Having the computed five metrics, five novelty detectors are trained from the validation dataset. A meta novelty detector fuses the output of the five trained novelty detectors to generate a meta confidence score. During online testing, our method determines if online samples are poisoned or not via assessing their meta confidence scores output by the meta novelty detector. We show the efficacy of our methodology through a broad range of backdoor attacks, including ablation studies and comparison to existing approaches. Our methodology is promising since the proposed five metrics quantify the inherent differences between clean and poisoned samples. Additionally, our detection method can be incrementally improved by appending more metrics that may be proposed to address future advanced attacks.


page 1

page 6

page 8

page 13


Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors

We present a systematic study of adversarial attacks on state-of-the-art...

Detecting Backdoors in Neural Networks Using Novel Feature-Based Anomaly Detection

This paper proposes a new defense against neural network backdooring att...

Online Black-Box Confidence Estimation of Deep Neural Networks

Autonomous driving (AD) and advanced driver assistance systems (ADAS) in...

Meta-Learning for Black-box Optimization

Recently, neural networks trained as optimizers under the "learning to l...

Black-box error diagnosis in deep neural networks: a survey of tools

The application of Deep Neural Networks (DNNs) to a broad variety of tas...

Novelty Detection in MultiClass Scenarios with Incomplete Set of Class Labels

We address the problem of novelty detection in multiclass scenarios wher...

Influential Sample Selection: A Graph Signal Processing Approach

With the growing complexity of machine learning techniques, understandin...

Please sign up or login with your details

Forgot password? Click here to reset