Threshold KNN-Shapley: A Linear-Time and Privacy-Friendly Approach to Data Valuation

08/30/2023
by   Jiachen T. Wang, et al.
0

Data valuation, a critical aspect of data-centric ML research, aims to quantify the usefulness of individual data sources in training machine learning (ML) models. However, data valuation faces significant yet frequently overlooked privacy challenges despite its importance. This paper studies these challenges with a focus on KNN-Shapley, one of the most practical data valuation methods nowadays. We first emphasize the inherent privacy risks of KNN-Shapley, and demonstrate the significant technical difficulties in adapting KNN-Shapley to accommodate differential privacy (DP). To overcome these challenges, we introduce TKNN-Shapley, a refined variant of KNN-Shapley that is privacy-friendly, allowing for straightforward modifications to incorporate DP guarantee (DP-TKNN-Shapley). We show that DP-TKNN-Shapley has several advantages and offers a superior privacy-utility tradeoff compared to naively privatized KNN-Shapley in discerning data quality. Moreover, even non-private TKNN-Shapley achieves comparable performance as KNN-Shapley. Overall, our findings suggest that TKNN-Shapley is a promising alternative to KNN-Shapley, particularly for real-world applications involving sensitive data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2022

A Critical Review on the Use (and Misuse) of Differential Privacy in Machine Learning

We review the use of differential privacy (DP) for privacy protection in...
research
05/19/2022

Differential Privacy: What is all the noise about?

Differential Privacy (DP) is a formal definition of privacy that provide...
research
09/04/2019

Privacy Accounting and Quality Control in the Sage Differentially Private ML Platform

Companies increasingly expose machine learning (ML) models trained over ...
research
08/03/2023

SoK: Assessing the State of Applied Federated Machine Learning

Machine Learning (ML) has shown significant potential in various applica...
research
02/21/2022

Personalized PATE: Differential Privacy for Machine Learning with Individual Privacy Guarantees

Applying machine learning (ML) to sensitive domains requires privacy pro...
research
09/03/2023

Privacy-Utility Tradeoff of OLS with Random Projections

We study the differential privacy (DP) of a core ML problem, linear ordi...
research
06/27/2023

Probing the Transition to Dataset-Level Privacy in ML Models Using an Output-Specific and Data-Resolved Privacy Profile

Differential privacy (DP) is the prevailing technique for protecting use...

Please sign up or login with your details

Forgot password? Click here to reset