Privacy Vulnerabilities of Dataset Anonymization Techniques

05/28/2019
by   Eyal Nussbaum, et al.
0

Vast amounts of information of all types are collected daily about people by governments, corporations and individuals. The information is collected when users register to or use on-line applications, receive health related services, use their mobile phones, utilize search engines, or perform common daily activities. As a result, there is an enormous quantity of privately-owned records that describe individuals' finances, interests, activities, and demographics. These records often include sensitive data and may violate the privacy of the users if published. The common approach to safeguarding user information, or data in general, is to limit access to the storage (usually a database) by using and authentication and authorization protocol. This way, only users with legitimate permissions can access the user data. In many cases though, the publication of user data for statistical analysis and research can be extremely beneficial for both academic and commercial uses, such as statistical research and recommendation systems. To maintain user privacy when such a publication occurs many databases employ anonymization techniques, either on the query results or the data itself. In this paper we examine variants of 2 such techniques, "data perturbation" and "query-set-size control" and discuss their vulnerabilities. Data perturbation deals with changing the values of records in the dataset while maintaining a level of accuracy over the resulting queries. We focus on a relatively new data perturbation method called NeNDS to show a possible partial knowledge attack on its privacy. The query-set-size control allows publication of a query result dependent on having a minimum set size, k, of records satisfying the query parameters. We show some query types relying on this method may still be used to extract hidden information, and prove others maintain privacy even when using multiple queries.

READ FULL TEXT
research
01/12/2021

Privacy Aspects of Provenance Queries

Given a query result of a big database, why-provenance can be used to ca...
research
02/08/2018

Open Data, Grey Data, and Stewardship: Universities at the Privacy Frontier

As universities recognize the inherent value in the data they collect an...
research
03/31/2019

KloakDB: A Platform for Analyzing Sensitive Data with K-anonymous Query Processing

A private data federation enables data owners to pool their information ...
research
12/15/2018

A Survey of Privacy Infrastructures and Their Vulnerabilities

Over the last two decades, the scale and complexity of Anonymous network...
research
05/03/2018

CYCLOSA: Decentralizing Private Web Search Through SGX-Based Browser Extensions

By regularly querying Web search engines, users (unconsciously) disclose...
research
12/25/2021

Defending Against Membership Inference Attacks on Beacon Services

Large genomic datasets are now created through numerous activities, incl...
research
05/16/2019

To Warn or Not to Warn: Online Signaling in Audit Games

Routine operational use of sensitive data is commonly governed by laws a...

Please sign up or login with your details

Forgot password? Click here to reset