Exact Acceleration of K-Means++ and K-Means

05/06/2021
by   Edward Raff, et al.
0

K-Means++ and its distributed variant K-Means have become de facto tools for selecting the initial seeds of K-means. While alternatives have been developed, the effectiveness, ease of implementation, and theoretical grounding of the K-means++ and methods have made them difficult to "best" from a holistic perspective. By considering the limited opportunities within seed selection to perform pruning, we develop specialized triangle inequality pruning strategies and a dynamic priority queue to show the first acceleration of K-Means++ and K-Means that is faster in run-time while being algorithmicly equivalent. For both algorithms we are able to reduce distance computations by over 500×. For K-means++ this results in up to a 17× speedup in run-time and a 551× speedup for K-means. We achieve this with simple, but carefully chosen, modifications to known techniques which makes it easy to integrate our approach into existing implementations of these algorithms.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset