Analysis of k-Nearest Neighbor Distances with Application to Entropy Estimation

03/28/2016
by   Shashank Singh, et al.
0

Estimating entropy and mutual information consistently is important for many machine learning applications. The Kozachenko-Leonenko (KL) estimator (Kozachenko & Leonenko, 1987) is a widely used nonparametric estimator for the entropy of multivariate continuous random variables, as well as the basis of the mutual information estimator of Kraskov et al. (2004), perhaps the most widely used estimator of mutual information in this setting. Despite the practical importance of these estimators, major theoretical questions regarding their finite-sample behavior remain open. This paper proves finite-sample bounds on the bias and variance of the KL estimator, showing that it achieves the minimax convergence rate for certain classes of smooth functions. In proving these bounds, we analyze finite-sample behavior of k-nearest neighbors (k-NN) distance statistics (on which the KL estimator is based). We derive concentration inequalities for k-NN distances and a general expectation bound for statistics of k-NN distances, which may be useful for other analyses of k-NN methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2018

Analysis of KNN Information Estimators for Smooth Distributions

KSG mutual information estimator, which is based on the distances of eac...
research
09/07/2016

Breaking the Bandwidth Barrier: Geometrical Adaptive Entropy Estimation

Estimators of information theoretic measures such as entropy and mutual ...
research
04/11/2016

Demystifying Fixed k-Nearest Neighbor Information Estimators

Estimating mutual information from i.i.d. samples drawn from an unknown ...
research
02/08/2020

Extrapolation Towards Imaginary 0-Nearest Neighbour and Its Improved Convergence Rate

k-nearest neighbour (k-NN) is one of the simplest and most widely-used m...
research
05/03/2011

Pruning nearest neighbor cluster trees

Nearest neighbor (k-NN) graphs are widely used in machine learning and d...
research
02/21/2020

Practical Estimation of Renyi Entropy

Entropy Estimation is an important problem with many applications in cry...
research
08/29/2022

Learned k-NN Distance Estimation

Big data mining is well known to be an important task for data science, ...

Please sign up or login with your details

Forgot password? Click here to reset