Bagged k-Distance for Mode-Based Clustering Using the Probability of Localized Level Sets

by   Hanyuan Hang, et al.

In this paper, we propose an ensemble learning algorithm named bagged k-distance for mode-based clustering (BDMBC) by putting forward a new measurement called the probability of localized level sets (PLLS), which enables us to find all clusters for varying densities with a global threshold. On the theoretical side, we show that with a properly chosen number of nearest neighbors k_D in the bagged k-distance, the sub-sample size s, the bagging rounds B, and the number of nearest neighbors k_L for the localized level sets, BDMBC can achieve optimal convergence rates for mode estimation. It turns out that with a relatively small B, the sub-sample size s can be much smaller than the number of training data n at each bagging round, and the number of nearest neighbors k_D can be reduced simultaneously. Moreover, we establish optimal convergence results for the level set estimation of the PLLS in terms of Hausdorff distance, which reveals that BDMBC can find localized level sets for varying densities and thus enjoys local adaptivity. On the practical side, we conduct numerical experiments to empirically verify the effectiveness of BDMBC for mode estimation and level set estimation, which demonstrates the promising accuracy and efficiency of our proposed algorithm.


page 20

page 24

page 25


Under-bagging Nearest Neighbors for Imbalanced Classification

In this paper, we propose an ensemble learning algorithm called under-ba...

Reweighting samples under covariate shift using a Wasserstein distance criterion

Considering two random variables with different laws to which we only ha...

Distributed Adaptive Nearest Neighbor Classifier: Algorithm and Theory

When data is of an extraordinarily large size or physically stored in di...

DD-EbA: An algorithm for determining the number of neighbors in cost estimation by analogy using distance distributions

Case Based Reasoning and particularly Estimation by Analogy, has been us...

Locally Adaptive Nearest Neighbors

When training automated systems, it has been shown to be beneficial to a...

Flexible K Nearest Neighbors Classifier: Derivation and Application for Ion-mobility Spectrometry-based Indoor Localization

The K Nearest Neighbors (KNN) classifier is widely used in many fields s...

Rank-deficiencies in a reduced information latent variable model

Latent variable models are well-known to suffer from rank deficiencies, ...

Please sign up or login with your details

Forgot password? Click here to reset