Unexpected Effects of Online K-means Clustering
In this paper we study k-means clustering in the online setting. In the offline setting the main parameters are number of centers, k, and size of the dataset, n. Performance guarantees are given as a function of these parameters. In the online setting new factors come into place: the ordering of the dataset and whether n is known in advance or not. One of the main results of this paper is the discovery that these new factors have dramatic effects on the quality of the clustering algorithms. For example, for constant k: (1) Ω(n) centers are needed if the order is arbitrary, (2) if the order is random and n is unknown in advance, the number of centers reduces to Θ(logn), and (3) if n is known, then the number of centers reduces to a constant. For different values of the new factors, we show upper and lower bounds that are exactly the same up to a constant, thus achieving optimal bounds.
READ FULL TEXT