Query complexity of adversarial attacks
Modern machine learning models are typically highly accurate but have been shown to be vulnerable to small, adversarially-chosen perturbations of the input. There are two main models of attacks considered in the literature: black-box and white-box. We consider these threat models as two ends of a fine-grained spectrum, indexed by the number of queries the adversary can ask. Using this point of view we investigate how many queries the adversary needs to make to design an attack that is comparable to the best possible attack in the white-box model. We analyze two classical learning algorithms on two synthetic tasks for which we prove meaningful security guarantees. The obtained bounds suggest that some learning algorithms are inherently more robust against query-bounded adversaries than others.
READ FULL TEXT