Forest Fire Clustering: Cluster-oriented Label Propagation Clustering and Monte Carlo Verification Inspired by Forest Fire Dynamics

03/22/2021
by   Zhanlin Chen, et al.
13

Clustering methods group data points together and assign them group-level labels. However, it has been difficult to evaluate the confidence of the clustering results. Here, we introduce a novel method that could not only find robust clusters but also provide a confidence score for the labels of each data point. Specifically, we reformulated label-propagation clustering to model after forest fire dynamics. The method has only one parameter - a fire temperature term describing how easily one label propagates from one node to the next. Through iteratively starting label propagations through a graph, we can discover the number of clusters in a dataset with minimum prior assumptions. Further, we can validate our predictions and uncover the posterior probability distribution of the labels using Monte Carlo simulations. Lastly, our iterative method is inductive and does not need to be retrained with the arrival of new data. Here, we describe the method and provide a summary of how the method performs against common clustering benchmarks.

READ FULL TEXT
research
03/14/2017

A Random Finite Set Model for Data Clustering

The goal of data clustering is to partition data points into groups to m...
research
04/26/2023

Multi-Task Learning Regression via Convex Clustering

Multi-task learning (MTL) is a methodology that aims to improve the gene...
research
12/27/2017

The information bottleneck and geometric clustering

The information bottleneck (IB) approach to clustering takes a joint dis...
research
07/15/2015

Unsupervised Decision Forest for Data Clustering and Density Estimation

An algorithm to improve performance parameter for unsupervised decision ...
research
03/31/2023

Bayesian Clustering via Fusing of Localized Densities

Bayesian clustering typically relies on mixture models, with each compon...
research
09/25/2017

Non-iterative Label Propagation on Optimal Leading Forest

Graph based semi-supervised learning (GSSL) has intuitive representation...
research
01/09/2018

Robust Propensity Score Computation Method based on Machine Learning with Label-corrupted Data

In biostatistics, propensity score is a common approach to analyze the i...

Please sign up or login with your details

Forgot password? Click here to reset