Model-based clustering of categorical data based on the Hamming distance

12/09/2022
by   Raffaele Argiento, et al.
0

A model-based approach is developed for clustering categorical data with no natural ordering. The proposed method exploits the Hamming distance to define a family of probability mass functions to model the data. The elements of this family are then considered as kernels of a finite mixture model with unknown number of components. Conjugate Bayesian inference has been derived for the parameters of the Hamming distribution model. The mixture is framed in a Bayesian nonparametric setting and a transdimensional blocked Gibbs sampler is developed to provide full Bayesian inference on the number of clusters, their structure and the group-specific parameters, facilitating the computation with respect to customary reversible jump algorithms. The proposed model encompasses a parsimonious latent class model as a special case, when the number of components is fixed. Model performances are assessed via a simulation study and reference datasets, showing improvements in clustering recovery over existing approaches.

READ FULL TEXT
research
10/09/2022

Bayesian Repulsive Mixture Modeling with Matérn Point Processes

Mixture models are a standard tool in statistical analysis, widely used ...
research
07/21/2023

Longitudinal Data Clustering with a Copula Kernel Mixture Model

Many common clustering methods cannot be used for clustering multivariat...
research
12/16/2022

Modelling and analysis of rank ordered data with ties via a generalized Plackett-Luce model

A simple generative model for rank ordered data with ties is presented. ...
research
05/20/2020

Dynamic mixtures of finite mixtures and telescoping sampling

Within a Bayesian framework, a comprehensive investigation of the model ...
research
04/19/2015

Exploring Bayesian Models for Multi-level Clustering of Hierarchically Grouped Sequential Data

A wide range of Bayesian models have been proposed for data that is divi...
research
08/21/2019

Clustering Longitudinal Life-Course Sequences using Mixtures of Exponential-Distance Models

Sequence analysis is an increasingly popular approach for the analysis o...
research
08/24/2017

GALILEO: A Generalized Low-Entropy Mixture Model

We present a new method of generating mixture models for data with categ...

Please sign up or login with your details

Forgot password? Click here to reset