Private Exploration Primitives for Data Cleaning

12/29/2017
by   Chang Ge, et al.
0

Data cleaning is the process of detecting and repairing inaccurate or corrupt records in the data. Data cleaning is inherently human-driven and state of the art systems assume cleaning experts can access the data to tune the cleaning process. However, in sensitive datasets, like electronic medical records, privacy constraints disallow unfettered access to the data. To address this challenge, we propose an utility-aware differentially private framework which allows data cleaner to query on the private data for a given cleaning task, while the data owner can track privacy loss over these queries. In this paper, we first identify a set of primitives based on counting queries for general data cleaning tasks and show that even with some errors, these cleaning tasks can be completed with reasonably good quality. We also design a privacy engine which translates the accuracy requirement per query specified by data cleaner to a differential privacy loss parameter ϵ and ensures all queries are answered under differential privacy. With extensive experiments using blocking and matching as examples, we demonstrate that our approach is able to achieve plausible cleaning quality and outperforms prior approaches to cleaning private data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2018

Shrinkwrap: Differentially-Private Query Processing in Private Data Federations

A private data federation is a set of autonomous databases that share a ...
research
07/18/2019

A Differentially Private Algorithm for Range Queries on Trajectories

We propose a novel algorithm to ensure ϵ-differential privacy for answer...
research
09/25/2019

Design of Algorithms under Policy-Aware Local Differential Privacy: Utility-Privacy Trade-offs

Local differential privacy (LDP) enables private data sharing and analyt...
research
06/30/2022

Imputation under Differential Privacy

The literature on differential privacy almost invariably assumes that th...
research
11/25/2022

M^2M: A general method to perform various data analysis tasks from a differentially private sketch

Differential privacy is the standard privacy definition for performing a...
research
07/30/2023

Integrated Private Data Trading Systems for Data Marketplaces

In the digital age, data is a valuable commodity, and data marketplaces ...
research
01/27/2022

Plume: Differential Privacy at Scale

Differential privacy has become the standard for private data analysis, ...

Please sign up or login with your details

Forgot password? Click here to reset