The space complexity of inner product filters
Motivated by the problem of filtering candidate pairs in inner product similarity joins we study the following problem: Given parameters d∈N, α>β≥ 0 and unit vectors x,y∈R^d consider the task of distinguishing between the cases 〈 x, y〉≤β and 〈 x, y〉≥α where 〈 x, y〉 = ∑_i=1^d x_i y_i is the inner product of vectors x and y. The goal is to distinguish these cases with information on each vector encoded independently in a bit string of the shortest length possible. This problem can be solved in general via estimating 〈 x, y〉 with an additive error bounded by ε = α - β. We show that d log_2 (√(1-β)ε) ±Θ(d) bits of information about each vector is necessary and sufficient. Our upper bound is constructive and improves a known upper bound of d log_2(1/ε) + O(d) by up to a factor of 2 when β is close to 1. The lower bound holds even in a stronger model where one of the vectors is known exactly, and an arbitrary estimation function is allowed.
READ FULL TEXT