Probabilistic Blocking with An Application to the Syrian Conflict

10/11/2018
by   Rebecca C. Steorts, et al.
0

Entity resolution seeks to merge databases as to remove duplicate entries where unique identifiers are typically unknown. We review modern blocking approaches for entity resolution, focusing on those based upon locality sensitive hashing (LSH). First, we introduce k-means locality sensitive hashing (KLSH), which is based upon the information retrieval literature and clusters similar records into blocks using a vector-space representation and projections. Second, we introduce a subquadratic variant of LSH to the literature, known as Densified One Permutation Hashing (DOPH). Third, we propose a weighted variant of DOPH. We illustrate each method on an application to a subset of the ongoing Syrian conflict, giving a discussion of each method.

READ FULL TEXT
research
10/07/2017

Unique Entity Estimation with Application to the Syrian Conflict

Entity resolution identifies and removes duplicate entities in large, no...
research
11/08/2019

Lock-Free Hopscotch Hashing

In this paper we present a lock-free version of Hopscotch Hashing. Hopsc...
research
08/19/2020

Scalable Blocking for Very Large Databases

In the field of database deduplication, the goal is to find approximatel...
research
03/16/2017

Ranking Based Locality Sensitive Hashing Enabled Cancelable Biometrics: Index-of-Max Hashing

In this paper, we propose a ranking based locality sensitive hashing ins...
research
11/16/2014

Revisiting Kernelized Locality-Sensitive Hashing for Improved Large-Scale Image Retrieval

We present a simple but powerful reinterpretation of kernelized locality...
research
08/10/2020

(Almost) All of Entity Resolution

Whether the goal is to estimate the number of people that live in a cong...
research
07/25/2018

Robust Set Reconciliation via Locality Sensitive Hashing

We consider variations of set reconciliation problems where two parties,...

Please sign up or login with your details

Forgot password? Click here to reset