Distributed Bayesian clustering using finite mixture of mixtures

03/31/2020
by   Hanyu Song, et al.
0

In many modern applications, there is interest in analyzing enormous data sets that cannot be easily moved across computers or loaded into memory on a single computer. In such settings, it is very common to be interested in clustering. Existing distributed clustering algorithms are mostly distance or density based without a likelihood specification, precluding the possibility of formal statistical inference. Model-based clustering allows statistical inference, yet research on distributed inference has emphasized nonparametric Bayesian mixture models over finite mixture models. To fill this gap, we introduce a nearly embarrassingly parallel algorithm for clustering under a Bayesian overfitted finite mixture of Gaussian mixtures, which we term distributed Bayesian clustering (DIB-C). DIB-C can flexibly accommodate data sets with various shapes (e.g. skewed or multi-modal). With data randomly partitioned and distributed, we first run Markov chain Monte Carlo in an embarrassingly parallel manner to obtain local clustering draws and then refine across workers for a final clustering estimate based on any loss function on the space of partitions. DIB-C can also estimate cluster densities, quickly classify new subjects and provide a posterior predictive distribution. Both simulation studies and real data applications show superior performance of DIB-C in terms of robustness and computational efficiency.

READ FULL TEXT
research
03/31/2020

Distributed Bayesian clustering

In many modern applications, there is interest in analyzing enormous dat...
research
12/20/2021

Bayesian nonparametric model based clustering with intractable distributions: an ABC approach

Bayesian nonparametric mixture models offer a rich framework for model b...
research
01/16/2020

Multiscale stick-breaking mixture models

We introduce a family of multiscale stick-breaking mixture models for Ba...
research
10/16/2020

Analysis of professional basketball field goal attempts via a Bayesian matrix clustering approach

We propose a Bayesian nonparametric matrix clustering approach to analyz...
research
11/03/2022

Statistical Inference for Scale Mixture Models via Mellin Transform Approach

This paper deals with statistical inference for the scale mixture models...
research
07/29/2020

Spatially dependent mixture models via the Logistic Multivariate CAR prior

We consider the problem of spatially dependent areal data, where for eac...
research
10/24/2017

A Bayesian Method for Joint Clustering of Vectorial Data and Network Data

We present a new model-based integrative method for clustering objects g...

Please sign up or login with your details

Forgot password? Click here to reset