Fair Near Neighbor Search: Independent Range Sampling in High Dimensions
Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. There are several variants of the similarity search problem, and one of the most relevant is the r-near neighbor (r-NN) problem: given a radius r>0 and a set of points S, construct a data structure that, for any given query point q, returns a point p within distance at most r from q. In this paper, we study the r-NN problem in the light of fairness. We consider fairness in the sense of equal opportunity: all points that are within distance r from the query should have the same probability to be returned. Locality sensitive hashing (LSH), the most common approach to similarity search in high dimensions, does not provide such a fairness guarantee. To address this, we propose efficient data structures for r-NN where all points in S that are near q have the same probability to be selected and returned by the query. Specifically, we first propose a black-box approach that, given any LSH scheme, constructs a data structure for uniformly sampling points in the neighborhood of a query. Then, we develop a data structure for fair similarity search under inner product, which requires nearly-linear space and exploits locality sensitive filters.
READ FULL TEXT