Nested partitions from hierarchical clustering statistical validation

06/17/2019
by   Christian Bongiorno, et al.
0

We develop a greedy algorithm that is fast and scalable in the detection of a nested partition extracted from a dendrogram obtained from hierarchical clustering of a multivariate series. Our algorithm provides a p-value for each clade observed in the hierarchical tree. The p-value is obtained by computing a number of bootstrap replicas of the dissimilarity matrix and by performing a statistical test on each difference between the dissimilarity associated with a given clade and the dissimilarity of the clade of its parent node. We prove the efficacy of our algorithm with a set of benchmarks generated by using a hierarchical factor model. We compare the results obtained by our algorithm with those of Pvclust. Pvclust is a widely used algorithm developed with a global approach originally motivated by phylogenetic studies. In our numerical experiments we focus on the role of multiple hypothesis test correction and on the robustness of the algorithms to inaccuracy and errors of datasets. We also apply our algorithm to a reference empirical dataset. We verify that our algorithm is much faster than Pvclust algorithm and has a better scalability both in the number of elements and in the number of records of the investigated multivariate set. Our algorithm provides a hierarchically nested partition in much shorter time than currently widely used algorithms allowing to perform a statistically validated cluster analysis detection in very large systems.

READ FULL TEXT
research
06/05/2021

Cluster Analysis via Random Partition Distributions

Hierarchical and k-medoids clustering are deterministic clustering algor...
research
04/13/2014

Anytime Hierarchical Clustering

We propose a new anytime hierarchical clustering method that iteratively...
research
02/09/2018

Bootstrap validation of links of a minimum spanning tree

We describe two different bootstrap methods applied to the detection of ...
research
05/13/2019

Bayesian Hierarchical Mixture Clustering using Multilevel Hierarchical Dirichlet Processes

This paper focuses on the problem of hierarchical non-overlapping cluste...
research
01/16/2013

A Nested HDP for Hierarchical Topic Models

We develop a nested hierarchical Dirichlet process (nHDP) for hierarchic...
research
12/02/2022

MHCCL: Masked Hierarchical Cluster-wise Contrastive Learning for Multivariate Time Series

Learning semantic-rich representations from raw unlabeled time series da...
research
01/31/2013

Axiomatic Construction of Hierarchical Clustering in Asymmetric Networks

This paper considers networks where relationships between nodes are repr...

Please sign up or login with your details

Forgot password? Click here to reset