Careful seeding for the k-medoids algorithm with incremental k++ cluster construction

07/06/2022
by   Difei Cheng, et al.
0

The k-medoids algorithm is a popular variant of the k-means algorithm and widely used in pattern recognition and machine learning. A main drawback of the k-medoids algorithm is that it can be trapped in local optima. An improved k-medoids algorithm (INCKM) was recently proposed to overcome this drawback, based on constructing a candidate medoids subset with a parameter choosing procedure, but it may fail when dealing with imbalanced datasets. In this paper, we propose a novel incremental k-medoids algorithm (INCKPP) which dynamically increases the number of clusters from 2 to k through a nonparametric and stochastic k-means++ search procedure. Our algorithm can overcome the parameter selection problem in the improved k-medoids algorithm, improve the clustering performance, and deal with imbalanced datasets very well. But our algorithm has a weakness in computation efficiency. To address this issue, we propose a fast INCKPP algorithm (called INCKPP_sample) which preserves the computational efficiency of the simple and fast k-medoids algorithm with an improved clustering performance. The proposed algorithm is compared with three state-of-the-art algorithms: the improved k-medoids algorithm (INCKM), the simple and fast k-medoids algorithm (FKM) and the k-means++ algorithm (KPP). Extensive experiments on both synthetic and real world datasets including imbalanced datasets illustrate the effectiveness of the proposed algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2020

Too Much Information Kills Information: A Clustering Perspective

Clustering is one of the most fundamental tools in the artificial intell...
research
09/23/2021

Fast Density Estimation for Density-based Clustering Methods

Density-based clustering algorithms are widely used for discovering clus...
research
02/08/2018

Peekaboo - Where are the Objects? Structure Adjusting Superpixels

This paper addresses the search for a fast and meaningful image segmenta...
research
03/01/2021

Adaptive Sampling for Minimax Fair Classification

Machine learning models trained on imbalanced datasets can often end up ...
research
05/28/2020

Learning How To Learn Within An LSM-based Key-Value Store

We introduce BOURBON, a log-structured merge (LSM) tree that utilizes ma...
research
09/12/2023

G-Mapper: Learning a Cover in the Mapper Construction

The Mapper algorithm is a visualization technique in topological data an...
research
04/19/2017

Pattern Recognition using Artificial Immune System

In this thesis, the uses of Artificial Immune Systems (AIS) in Machine l...

Please sign up or login with your details

Forgot password? Click here to reset