Integrating Unsupervised Clustering and Label-specific Oversampling to Tackle Imbalanced Multi-label Data

09/25/2021
by   Payel Sadhukhan, et al.
0

There is often a mixture of very frequent labels and very infrequent labels in multi-label datatsets. This variation in label frequency, a type class imbalance, creates a significant challenge for building efficient multi-label classification algorithms. In this paper, we tackle this problem by proposing a minority class oversampling scheme, UCLSO, which integrates Unsupervised Clustering and Label-Specific data Oversampling. Clustering is performed to find out the key distinct and locally connected regions of a multi-label dataset (irrespective of the label information). Next, for each label, we explore the distributions of minority points in the cluster sets. Only the minority points within a cluster are used to generate the synthetic minority points that are used for oversampling. Even though the cluster set is the same across all labels, the distributions of the synthetic minority points will vary across the labels. The training dataset is augmented with the set of label-specific synthetic minority points, and classifiers are trained to predict the relevance of each label independently. Experiments using 12 multi-label datasets and several multi-label algorithms show that the proposed method performed very well compared to the other competing algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2020

Multi-Label Sampling based on Local Label Imbalance

Class imbalance is an inherent characteristic of multi-label data that h...
research
05/12/2020

Unsupervised Multi-label Dataset Generation from Web Data

This paper presents a system towards the generation of multi-label datas...
research
05/22/2021

PLM: Partial Label Masking for Imbalanced Multi-label Classification

Neural networks trained on real-world datasets with long-tailed label di...
research
03/02/2021

Multi-label Classification via Adaptive Resonance Theory-based Clustering

This paper proposes a multi-label classification algorithm capable of co...
research
05/08/2020

Multi-Instance Multi-Label Learning for Gene Mutation Prediction in Hepatocellular Carcinoma

Gene mutation prediction in hepatocellular carcinoma (HCC) is of great d...
research
08/29/2017

EC3: Combining Clustering and Classification for Ensemble Learning

Classification and clustering algorithms have been proved to be successf...
research
11/09/2020

Multi-label Causal Variable Discovery: Learning Common Causal Variables and Label-specific Causal Variables

Causal variables in Markov boundary (MB) have been widely applied in ext...

Please sign up or login with your details

Forgot password? Click here to reset