Trojan Horse Training for Breaking Defenses against Backdoor Attacks in Deep Learning

by   Arezoo Rajabi, et al.

Machine learning (ML) models that use deep neural networks are vulnerable to backdoor attacks. Such attacks involve the insertion of a (hidden) trigger by an adversary. As a consequence, any input that contains the trigger will cause the neural network to misclassify the input to a (single) target class, while classifying other inputs without a trigger correctly. ML models that contain a backdoor are called Trojan models. Backdoors can have severe consequences in safety-critical cyber and cyber physical systems when only the outputs of the model are available. Defense mechanisms have been developed and illustrated to be able to distinguish between outputs from a Trojan model and a non-Trojan model in the case of a single-target backdoor attack with accuracy > 96 percent. Understanding the limitations of a defense mechanism requires the construction of examples where the mechanism fails. Current single-target backdoor attacks require one trigger per target class. We introduce a new, more general attack that will enable a single trigger to result in misclassification to more than one target class. Such a misclassification will depend on the true (actual) class that the input belongs to. We term this category of attacks multi-target backdoor attacks. We demonstrate that a Trojan model with either a single-target or multi-target trigger can be trained so that the accuracy of a defense mechanism that seeks to distinguish between outputs coming from a Trojan and a non-Trojan model will be reduced. Our approach uses the non-Trojan model as a teacher for the Trojan model and solves a min-max optimization problem between the Trojan model and defense mechanism. Empirical evaluations demonstrate that our training procedure reduces the accuracy of a state-of-the-art defense mechanism from >96 to 0 percent.


page 1

page 6


Dynamic Backdoor Attacks Against Machine Learning Models

Machine learning (ML) has made tremendous progress during the past decad...

Don't Trigger Me! A Triggerless Backdoor Attack Against Deep Neural Networks

Backdoor attack against deep neural networks is currently being profound...

Marksman Backdoor: Backdoor Attacks with Arbitrary Target Class

In recent years, machine learning models have been shown to be vulnerabl...

Breaking the De-Pois Poisoning Defense

Attacks on machine learning models have been, since their conception, a ...

Backdoor Scanning for Deep Neural Networks through K-Arm Optimization

Back-door attack poses a severe threat to deep learning systems. It inje...

Deep Learning model integrity checking mechanism using watermarking technique

In response to the growing popularity of Machine Learning (ML) technique...

Preventing Outages under Coordinated Cyber-Physical Attack with Secured PMUs

Due to the potentially severe consequences of coordinated cyber-physical...

Please sign up or login with your details

Forgot password? Click here to reset