New Nearly-Optimal Coreset for Kernel Density Estimation
Given a point set Pāā^d, kernel density estimation for Gaussian kernel is defined as š¢_P(x) = 1/|P|ā_pā Pe^-ā x-p ā^2 for any xāā^d. We study how to construct a small subset Q of P such that the kernel density estimation of P can be approximated by the kernel density estimation of Q. This subset Q is called coreset. The primary technique in this work is to construct Ā± 1 coloring on the point set P by the discrepancy theory and apply this coloring algorithm recursively. Our result leverages Banaszczyk's Theorem. When d>1 is constant, our construction gives a coreset of size O(1/Īµā(loglog1/Īµ)) as opposed to the best-known result of O(1/Īµā(log1/Īµ)). It is the first to give a breakthrough on the barrier of ā(log) factor even when d=2.
READ FULL TEXT