Towards the Infeasibility of Membership Inference on Deep Models
Recent studies propose membership inference (MI) attacks on deep models. Despite the moderate accuracy of such MI attacks, we show that the way the attack accuracy is reported is often misleading and a simple blind attack which is highly unreliable and inefficient in reality can often represent similar accuracy. We show that the current MI attack models can only identify the membership of misclassified samples with mediocre accuracy at best, which only constitute a very small portion of training samples. We analyze several new features that have not been explored for membership inference before, including distance to the decision boundary and gradient norms, and conclude that deep models' responses are mostly indistinguishable among train and non-train samples. Moreover, in contrast with general intuition that deeper models have a capacity to memorize training samples, and, hence, they are more vulnerable to membership inference, we find no evidence to support that and in some cases deeper models are often harder to launch membership inference attack on. Furthermore, despite the common belief, we show that overfitting does not necessarily lead to higher degree of membership leakage. We conduct experiments on MNIST, CIFAR-10, CIFAR-100, and ImageNet, using various model architecture, including LeNet, ResNet, DenseNet, InceptionV3, and Xception. Source code: https://github.com/shrezaei/MI-Attackbluehttps://github.com/shrezaei/MI-Attack.
READ FULL TEXT