Incremental Clustering Techniques for Multi-Party Privacy-Preserving Record Linkage

11/29/2019
by   Dinusha Vatsalan, et al.
0

Privacy-Preserving Record Linkage (PPRL) supports the integration of sensitive information from multiple datasets, in particular the privacy-preserving matching of records referring to the same entity. PPRL has gained much attention in many application areas, with the most prominent ones in the healthcare domain. PPRL techniques tackle this problem by conducting linkage on masked (encoded) values. Employing PPRL on records from multiple (more than two) parties/sources (multi-party PPRL, MP-PPRL) is an increasingly important but challenging problem that so far has not been sufficiently solved. Existing MP-PPRL approaches are limited to finding only those entities that are present in all parties thereby missing entities that match only in a subset of parties. Furthermore, previous MP-PPRL approaches face substantial scalability limitations due to the need of a large number of comparisons between masked records. We thus propose and evaluate new MP-PPRL approaches that find matches in any subset of parties and still scale to many parties. Our approaches maintain all matches within clusters, where these clusters are incrementally extended or refined by considering records from one party after the other. An empirical evaluation using multiple real datasets ranging from 3 to 26 parties each containing up to 5 million records validates that our protocols are efficient, and significantly outperform existing MP-PPRL approaches in terms of linkage quality and scalability.

READ FULL TEXT
research
11/03/2022

Privacy-preserving Deep Learning based Record Linkage

Deep learning-based linkage of records across different databases is bec...
research
08/23/2021

AMPPERE: A Universal Abstract Machine for Privacy-Preserving Entity Resolution Evaluation

Entity resolution is the task of identifying records in different datase...
research
04/09/2019

Privacy-Preserving Hierarchical Clustering: Formal Security and Efficient Approximation

Machine Learning (ML) is widely used for predictive tasks in a number of...
research
08/07/2023

Labeling without Seeing? Blind Annotation for Privacy-Preserving Entity Resolution

The entity resolution problem requires finding pairs across datasets tha...
research
01/09/2023

Privacy-Preserving Record Linkage for Cardinality Counting

Several applications require counting the number of distinct items in th...
research
07/22/2023

CryptoMask : Privacy-preserving Face Recognition

Face recognition is a widely-used technique for identification or verifi...
research
08/08/2023

The Still Secret Ballot: The Limited Privacy Cost of Transparent Election Results

After an election, should election officials release an electronic recor...

Please sign up or login with your details

Forgot password? Click here to reset