Frequency Estimation with One-Sided Error
Frequency estimation is one of the most fundamental problems in streaming algorithms. Given a stream S of elements from some universe U={1 … n}, the goal is to compute, in a single pass, a short sketch of S so that for any element i ∈ U, one can estimate the number x_i of times i occurs in S based on the sketch alone. Two state of the art solutions to this problems are the Count-Min and Count-Sketch algorithms. The frequency estimator x̃ produced by Count-Min, using O(1/ε·log n) dimensions, guarantees that x̃-x_∞≤εx_1 with high probability, and x̃≥ x holds deterministically. Also, Count-Min works under the assumption that x ≥ 0. On the other hand, Count-Sketch, using O(1/ε^2 ·log n) dimensions, guarantees that x̃-x_∞≤εx_2 with high probability. A natural question is whether it is possible to design the best of both worlds sketching method, with error guarantees depending on the ℓ_2 norm and space comparable to Count-Sketch, but (like Count-Min) also has the no-underestimation property. Our main set of results shows that the answer to the above question is negative. We show this in two incomparable computational models: linear sketching and streaming algorithms. We also study the complementary problem, where the sketch is required to not over-estimate, i.e., x̃≤ x should hold always.
READ FULL TEXT