Label-Leaks: Membership Inference Attack with Label
Machine learning (ML) has made tremendous progress during the past decade and ML models have been deployed in many real-world applications. However, recent research has shown that ML models are vulnerable to attacks against their underlying training data. One major attack in this field is membership inference the goal of which is to determine whether a data sample is part of the training set of a target machine learning model. So far, most of the membership inference attacks against ML classifiers leverage the posteriors returned by the target model as their input. However, empirical results show that these attacks can be easily mitigated if the target model only returns the predicted label instead of posteriors. In this paper, we perform a systematic investigation of membership inference attack when the target model only provides the predicted label. We name our attack label-only membership inference attack. We focus on two adversarial settings and propose different attacks, namely transfer-based attack and perturbation based attack. The transfer-based attack follows the intuition that if a locally established shadow model is similar enough to the target model, then the adversary can leverage the shadow model's information to predict a target sample's membership. The perturbation-based attack relies on adversarial perturbation techniques to modify the target sample to a different class and uses the magnitude of the perturbation to judge whether it is a member or not. This is based on the intuition that a member sample is harder to be perturbed to a different class than a non-member sample. Extensive experiments over 6 different datasets demonstrate that both of our attacks achieve strong performance. This further demonstrates the severity of membership privacy risks of machine learning models.
READ FULL TEXT