Further heuristics for k-means: The merge-and-split heuristic and the (k,l)-means

06/23/2014
by   Frank Nielsen, et al.
0

Finding the optimal k-means clustering is NP-hard in general and many heuristics have been designed for minimizing monotonically the k-means objective. We first show how to extend Lloyd's batched relocation heuristic and Hartigan's single-point relocation heuristic to take into account empty-cluster and single-point cluster events, respectively. Those events tend to increasingly occur when k or d increases, or when performing several restarts. First, we show that those special events are a blessing because they allow to partially re-seed some cluster centers while further minimizing the k-means objective function. Second, we describe a novel heuristic, merge-and-split k-means, that consists in merging two clusters and splitting this merged cluster again with two new centers provided it improves the k-means objective. This novel heuristic can improve Hartigan's k-means when it has converged to a local minimum. We show empirically that this merge-and-split k-means improves over the Hartigan's heuristic which is the de facto method of choice. Finally, we propose the (k,l)-means objective that generalizes the k-means objective by associating the data points to their l closest cluster centers, and show how to either directly convert or iteratively relax the (k,l)-means into a k-means in order to reach better local minima.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro