Automated Imbalanced Classification via Layered Learning

by   Vítor Cerqueira, et al.

In this paper we address imbalanced binary classification (IBC) tasks. Applying resampling strategies to balance the class distribution of training instances is a common approach to tackle these problems. Many state-of-the-art methods find instances of interest close to the decision boundary to drive the resampling process. However, under-sampling the majority class may potentially lead to important information loss. Over-sampling also may increase the chance of overfitting by propagating the information contained in instances from the minority class. The main contribution of our work is a new method called ICLL for tackling IBC tasks which is not based on resampling training observations. Instead, ICLL follows a layered learning paradigm to model the data in two stages. In the first layer, ICLL learns to distinguish cases close to the decision boundary from cases which are clearly from the majority class, where this dichotomy is defined using a hierarchical clustering analysis. In the subsequent layer, we use instances close to the decision boundary and instances from the minority class to solve the original predictive task. A second contribution of our work is the automatic definition of the layers which comprise the layered learning strategy using a hierarchical clustering model. This is a relevant discovery as this process is usually performed manually according to domain knowledge. We carried out extensive experiments using 100 benchmark data sets. The results show that the proposed method leads to a better performance relatively to several state-of-the-art methods for IBC.


Effective Decision Boundary Learning for Class Incremental Learning

Rehearsal approaches in class incremental learning (CIL) suffer from dec...

Synthetic Over-sampling with the Minority and Majority classes for imbalance problems

Class imbalance is a substantial challenge in classifying many real-worl...

Gamma distribution-based sampling for imbalanced data

Imbalanced class distribution is a common problem in a number of fields ...

A Hybrid Approach for Binary Classification of Imbalanced Data

Binary classification with an imbalanced dataset is challenging. Models ...

G-SMOTE: A GMM-based synthetic minority oversampling technique for imbalanced learning

Imbalanced Learning is an important learning algorithm for the classific...

Counterfactual-based minority oversampling for imbalanced classification

A key challenge of oversampling in imbalanced classification is that the...

Box Drawings for Learning with Imbalanced Data

The vast majority of real world classification problems are imbalanced, ...

Please sign up or login with your details

Forgot password? Click here to reset