MEGEX: Data-Free Model Extraction Attack against Gradient-Based Explainable AI

07/19/2021
by   Takayuki Miura, et al.
0

The advance of explainable artificial intelligence, which provides reasons for its predictions, is expected to accelerate the use of deep neural networks in the real world like Machine Learning as a Service (MLaaS) that returns predictions on queried data with the trained model. Deep neural networks deployed in MLaaS face the threat of model extraction attacks. A model extraction attack is an attack to violate intellectual property and privacy in which an adversary steals trained models in a cloud using only their predictions. In particular, a data-free model extraction attack has been proposed recently and is more critical. In this attack, an adversary uses a generative model instead of preparing input data. The feasibility of this attack, however, needs to be studied since it requires more queries than that with surrogate datasets. In this paper, we propose MEGEX, a data-free model extraction attack against a gradient-based explainable AI. In this method, an adversary uses the explanations to train the generative model and reduces the number of queries to steal the model. Our experiments show that our proposed method reconstructs high-accuracy models – 0.97× and 0.98× the victim model accuracy on SVHN and CIFAR-10 datasets given 2M and 20M queries, respectively. This implies that there is a trade-off between the interpretability of models and the difficulty of stealing them.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2023

AUTOLYCUS: Exploiting Explainable AI (XAI) for Model Extraction Attacks against Decision Tree Models

Model extraction attack is one of the most prominent adversarial techniq...
research
07/26/2022

Generative Extraction of Audio Classifiers for Speaker Identification

It is perhaps no longer surprising that machine learning models, especia...
research
09/30/2021

First to Possess His Statistics: Data-Free Model Extraction Attack on Tabular Data

Model extraction attacks are a kind of attacks where an adversary obtain...
research
11/30/2020

Data-Free Model Extraction

Current model extraction attacks assume that the adversary has access to...
research
03/03/2017

Generative Poisoning Attack Method Against Neural Networks

Poisoning attack is identified as a severe security threat to machine le...
research
06/29/2022

Private Graph Extraction via Feature Explanations

Privacy and interpretability are two of the important ingredients for ac...
research
04/13/2021

Thief, Beware of What Get You There: Towards Understanding Model Extraction Attack

Model extraction increasingly attracts research attentions as keeping co...

Please sign up or login with your details

Forgot password? Click here to reset