Bayesian Estimation of Bipartite Matchings for Record Linkage

01/25/2016
by   Mauricio Sadinle, et al.
0

The bipartite record linkage task consists of merging two disparate datafiles containing information on two overlapping sets of entities. This is non-trivial in the absence of unique identifiers and it is important for a wide variety of applications given that it needs to be solved whenever we have to combine information from different sources. Most statistical techniques currently used for record linkage are derived from a seminal paper by Fellegi and Sunter (1969). These techniques usually assume independence in the matching statuses of record pairs to derive estimation procedures and optimal point estimators. We argue that this independence assumption is unreasonable and instead target a bipartite matching between the two datafiles as our parameter of interest. Bayesian implementations allow us to quantify uncertainty on the matching decisions and derive a variety of point estimators using different loss functions. We propose partial Bayes estimates that allow uncertain parts of the bipartite matching to be left unresolved. We evaluate our approach to record linkage using a variety of challenging scenarios and show that it outperforms the traditional methodology. We illustrate the advantages of our methods merging two datafiles on casualties from the civil war of El Salvador.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2021

Multifile Partitioning for Record Linkage and Duplicate Detection

Merging datafiles containing information on overlapping sets of entities...
research
10/11/2018

Generalized Bayesian Record Linkage and Regression with Exact Error Propagation

Record linkage (de-duplication or entity resolution) is the process of m...
research
12/22/2018

Bayesian Propagation of Record Linkage Uncertainty into Population Size Estimation of Human Rights Violations

Multiple-systems or capture-recapture estimation are common techniques f...
research
10/30/2021

The CAT SET on the MAT: Cross Attention for Set Matching in Bipartite Hypergraphs

Usual relations between entities could be captured using graphs; but tho...
research
05/14/2012

A Generalized Fellegi-Sunter Framework for Multiple Record Linkage With Application to Homicide Record Systems

We present a probabilistic method for linking multiple datafiles. This t...
research
01/31/2022

Eris: Measuring discord among multidimensional data sources

Data integration is a classical problem in databases, typically decompos...
research
03/09/2020

Fast Bayesian Record Linkage With Record-Specific Disagreement Parameters

Applied researchers are often interested in linking individuals between ...

Please sign up or login with your details

Forgot password? Click here to reset