An adequacy approach for deciding the number of clusters for OTRIMLE robust Gaussian mixture based clustering

09/02/2020
by   Christian Hennig, et al.
0

We introduce a new approach to deciding the number of clusters. The approach is applied to Optimally Tuned Robust Improper Maximum Likelihood Estimation (OTRIMLE; Coretto and Hennig 2016) of a Gaussian mixture model allowing for observations to be classified as "noise", but it can be applied to other clustering methods as well. The quality of a clustering is assessed by a statistic Q that measures how close the within-cluster distributions are to elliptical unimodal distributions that have the only mode in the mean. This nonparametric measure allows for non-Gaussian clusters as long as they have a good quality according to Q. The simplicity of a model is assessed by a measure S that prefers a smaller number of clusters unless additional clusters can reduce the estimated noise proportion substantially. The simplest model is then chosen that is adequate for the data in the sense that its observed value of Q is not significantly larger than what is expected for data truly generated from the fitted model, as can be assessed by parametric bootstrap. The approach is compared with model-based clustering using the Bayesian Information Criterion (BIC) in a simulation study and on two datasets of scientific interest. Keywords: parametric bootstrap; noise component; unimodality; model-based clustering

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2023

Some issues in robust clustering

Some key issues in robust clustering are discussed with focus on Gaussia...
research
09/05/2023

Superclustering by finding statistically significant separable groups of optimal gaussian clusters

The paper presents the algorithm for clustering a dataset by grouping th...
research
09/15/2023

Choice of trimming proportion and number of clusters in robust clustering based on trimming

So-called "classification trimmed likelihood curves" have been proposed ...
research
02/08/2023

Estimation of Gaussian Bi-Clusters with General Block-Diagonal Covariance Matrix and Applications

Bi-clustering is a technique that allows for the simultaneous clustering...
research
09/23/2022

Nonparametric clustering of RNA-sequencing data

Identification of clusters of co-expressed genes in transcriptomic data ...
research
09/06/2019

Unsupervised Clustering of Quantitative Imaging Phenotypes using Autoencoder and Gaussian Mixture Model

Quantitative medical image computing (radiomics) has been widely applied...
research
06/09/2023

An introduction and tutorial to model-based clustering in education via Gaussian mixture modelling

Heterogeneity has been a hot topic in recent educational literature. Sev...

Please sign up or login with your details

Forgot password? Click here to reset