Stronger Data Poisoning Attacks Break Data Sanitization Defenses

by   Pang Wei Koh, et al.

Machine learning models trained on data from the outside world can be corrupted by data poisoning attacks that inject malicious points into the models' training sets. A common defense against these attacks is data sanitization: first filter out anomalous training points before training the model. Can data poisoning attacks break data sanitization defenses? In this paper, we develop three new attacks that can all bypass a broad range of data sanitization defenses, including commonly-used anomaly detectors based on nearest neighbors, training loss, and singular-value decomposition. For example, our attacks successfully increase the test error on the Enron spam detection dataset from 3 dataset from 12 existing attacks from the literature do not explicitly consider defenses, and we show that those attacks are ineffective in the presence of the defenses we consider. Our attacks are based on two ideas: (i) we coordinate our attacks to place poisoned points near one another, which fools some anomaly detectors, and (ii) we formulate each attack as a constrained optimization problem, with constraints designed to ensure that the poisoned points evade detection. While this optimization involves solving an expensive bilevel problem, we explore and develop three efficient approximations to this problem based on influence functions; minimax duality; and the Karush-Kuhn-Tucker (KKT) conditions. Our results underscore the urgent need to develop more sophisticated and robust defenses against data poisoning attacks.


An Investigation of Data Poisoning Defenses for Online Learning

We consider data poisoning attacks, where an adversary can modify a smal...

A Game Theoretic Analysis of Additive Adversarial Attacks and Defenses

Research in adversarial learning follows a cat and mouse game between at...

Designing an attack-defense game: how to increase robustness of financial transaction models via a competition

Given the escalating risks of malicious attacks in the finance sector an...

How to Sift Out a Clean Data Subset in the Presence of Data Poisoning?

Given the volume of data needed to train modern machine learning models,...

Certified Robustness of Nearest Neighbors against Data Poisoning Attacks

Data poisoning attacks aim to corrupt a machine learning model via modif...

SoK: Why Have Defenses against Social Engineering Attacks Achieved Limited Success?

Social engineering attacks are a major cyber threat because they often s...

Detecting Backdoors in Neural Networks Using Novel Feature-Based Anomaly Detection

This paper proposes a new defense against neural network backdooring att...

Please sign up or login with your details

Forgot password? Click here to reset