An agglomerative hierarchical clustering method by optimizing the average silhouette width
An agglomerative hierarchical clustering (AHC) framework and algorithm named HOSil based on a new linkage metric optimized by the average silhouette width (ASW) index is proposed. A conscientious investigation of various clustering methods and estimation indices is conducted across a diverse verities of data structures for three aims: a) clustering quality, b) clustering recovery, and c) estimation of number of clusters. HOSil has shown better clustering quality for a range of artificial and real world data structures as compared to k-means, PAM, single, complete, average, Ward, McQuitty, spectral, model-based, and several estimation methods. It can identify clusters of various shapes including spherical, elongated, relatively small sized clusters, clusters coming from different distributions including uniform, t, gamma and others. HOSil has shown good recovery for correct determination of the number of clusters. For some data structures only HOSil was able to identify the correct number of clusters.
READ FULL TEXT