Conditional Masking to Numerical Data

07/13/2018
by   Debolina Ghatak, et al.
0

Protecting the privacy of data-sets has become hugely important these days. Many real-life data-sets like income data, medical data need to be secured before making it public. However, security comes at the cost of losing some useful statistical information about the data-set. Data obfuscation deals with this problem of masking a data-set in such a way that the utility of the data is maximized while minimizing the risk of the disclosure of sensitive information. Two popular approaches to data obfuscation for numerical data involves (i) data swapping and (ii) adding noise to data. While the former masks well sacrificing the whole of correlation information, the latter gives estimates for most of the popular statistics like mean, variance, quantiles, correlation but fails to give an unbiased estimate of the distribution curve of the original data. In this paper, we propose a mixed method of obfuscation combining the above two approaches and discuss how the proposed method succeeds in giving an unbiased estimation of the distribution curve while giving reliable estimates of the other well-known statistics like moments, correlation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2023

Obfuscation of Discrete Data

Data obfuscation deals with the problem of masking a data-set in such a ...
research
10/18/2018

The exponentiated xgammma distribution: Estimation and its application

This article aims to introduced a new lifetime distribution named as exp...
research
02/10/2010

Intrinsic dimension estimation of data by principal component analysis

Estimating intrinsic dimensionality of data is a classic problem in patt...
research
10/27/2020

On an Induced Distribution and its Statistical Properties

In this study an attempt has been made to propose a way to develop new d...
research
02/17/2019

Separating common (global and local) and distinct variation in multiple mixed types data sets

Multiple sets of measurements on the same objects obtained from differen...
research
08/04/2018

Bounded Statistics

If two probability density functions (PDFs) have values for their first ...
research
11/07/2017

Gaussian Lower Bound for the Information Bottleneck Limit

The Information Bottleneck (IB) is a conceptual method for extracting th...

Please sign up or login with your details

Forgot password? Click here to reset