Achieving Approximate Soft Clustering in Data Streams

07/26/2012
by   Vaneet Aggarwal, et al.
0

In recent years, data streaming has gained prominence due to advances in technologies that enable many applications to generate continuous flows of data. This increases the need to develop algorithms that are able to efficiently process data streams. Additionally, real-time requirements and evolving nature of data streams make stream mining problems, including clustering, challenging research problems. In this paper, we propose a one-pass streaming soft clustering (membership of a point in a cluster is described by a distribution) algorithm which approximates the "soft" version of the k-means objective function. Soft clustering has applications in various aspects of databases and machine learning including density estimation and learning mixture models. We first achieve a simple pseudo-approximation in terms of the "hard" k-means algorithm, where the algorithm is allowed to output more than k centers. We convert this batch algorithm to a streaming one (using an extension of the k-means++ algorithm recently proposed) in the "cash register" model. We also extend this algorithm when the clustering is done over a moving window in the data stream.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/16/2020

Data Stream Clustering: A Review

Number of connected devices is steadily increasing and these devices con...
research
11/21/2019

S-RASTER: Contraction Clustering for Evolving Data Streams

Contraction Clustering (RASTER) is a very fast algorithm for density-bas...
research
07/08/2017

Learning Mixture of Gaussians with Streaming Data

In this paper, we study the problem of learning a mixture of Gaussians w...
research
10/13/2022

Dirichlet process mixture models for non-stationary data streams

In recent years, we have seen a handful of work on inference algorithms ...
research
10/02/2019

Streaming Balanced Clustering

Clustering of data points in metric space is among the most fundamental ...
research
02/28/2018

Maximum likelihood estimation of a finite mixture of logistic regression models in a continuous data stream

In marketing we are often confronted with a continuous stream of respons...
research
06/24/2022

SECLEDS: Sequence Clustering in Evolving Data Streams via Multiple Medoids and Medoid Voting

Sequence clustering in a streaming environment is challenging because it...

Please sign up or login with your details

Forgot password? Click here to reset