Clustering Stream Data by Exploring the Evolution of Density Mountain

by   Shufeng Gong, et al.

Stream clustering is a fundamental problem in many streaming data analysis applications. Comparing to classical batch-mode clustering, there are two key challenges in stream clustering: (i) Given that input data are changing continuously, how to incrementally update clustering results efficiently? (ii) Given that clusters continuously evolve with the evolution of data, how to capture the cluster evolution activities? Unfortunately, most of existing stream clustering algorithms can neither update the cluster result in real time nor track the evolution of clusters. In this paper, we propose an stream clustering algorithm EDMStream by exploring the Evolution of Density Mountain. The density mountain is used to abstract the data distribution, the changes of which indicate data distribution evolution. We track the evolution of clusters by monitoring the changes of density mountains. We further provide efficient data structures and filtering schemes to ensure the update of density mountains in real time, which makes online clustering possible. The experimental results on synthetic and real datasets show that, comparing to the state-of-the-art stream clustering algorithms, e.g., D-Stream, DenStream, DBSTREAM and MR-Stream, our algorithm can response to a cluster update much faster (say 7-15x faster than the best of the competitors) and at the same time achieve comparable cluster quality. Furthermore, EDMStream can successfully capture the cluster evolution activities.


page 1

page 2

page 3

page 4


Data Stream Clustering: A Review

Number of connected devices is steadily increasing and these devices con...

Online Clustering by Penalized Weighted GMM

With the dawn of the Big Data era, data sets are growing rapidly. Data i...

SECLEDS: Sequence Clustering in Evolving Data Streams via Multiple Medoids and Medoid Voting

Sequence clustering in a streaming environment is challenging because it...

Efficient Dynamic Clustering: Capturing Patterns from Historical Cluster Evolution

Clustering aims to group unlabeled objects based on similarity inherent ...

α-Approximation Density-based Clustering of Multi-valued Objects

Multi-valued data are commonly found in many real applications. During t...

BETULA: Numerically Stable CF-Trees for BIRCH Clustering

BIRCH clustering is a widely known approach for clustering, that has inf...

Benne: A Modular and Self-Optimizing Algorithm for Data Stream Clustering

In various real-world applications, ranging from the Internet of Things ...

Please sign up or login with your details

Forgot password? Click here to reset