Feature Selection for Imbalanced Data with Deep Sparse Autoencoders Ensemble

by   Michela C. Massi, et al.

Class imbalance is a common issue in many domain applications of learning algorithms. Oftentimes, in the same domains it is much more relevant to correctly classify and profile minority class observations. This need can be addressed by Feature Selection (FS), that offers several further advantages, s.a. decreasing computational costs, aiding inference and interpretability. However, traditional FS techniques may become sub-optimal in the presence of strongly imbalanced data. To achieve FS advantages in this setting, we propose a filtering FS algorithm ranking feature importance on the basis of the Reconstruction Error of a Deep Sparse AutoEncoders Ensemble (DSAEE). We use each DSAE trained only on majority class to reconstruct both classes. From the analysis of the aggregated Reconstruction Error, we determine the features where the minority class presents a different distribution of values w.r.t. the overrepresented one, thus identifying the most relevant features to discriminate between the two. We empirically demonstrate the efficacy of our algorithm in several experiments on high-dimensional datasets of varying sample size, showcasing its capability to select relevant and generalizable features to profile and classify minority class, outperforming other benchmark FS methods. We also briefly present a real application in radiogenomics, where the methodology was applied successfully.


page 1

page 13


Cost-Sensitive Feature Selection by Optimizing F-Measures

Feature selection is beneficial for improving the performance of general...

On dynamic ensemble selection and data preprocessing for multi-class imbalance learning

Class-imbalance refers to classification problems in which many more ins...

Fractal Autoencoders for Feature Selection

Feature selection reduces the dimensionality of data by identifying a su...

Feature Selection using Sparse Adaptive Bottleneck Centroid-Encoder

We introduce a novel nonlinear model, Sparse Adaptive Bottleneck Centroi...

Quick and Robust Feature Selection: the Strength of Energy-efficient Sparse Training for Autoencoders

Major complications arise from the recent increase in the amount of high...

Classification Trees for Imbalanced and Sparse Data: Surface-to-Volume Regularization

Classification algorithms face difficulties when one or more classes hav...

FIB: A Method for Evaluation of Feature Impact Balance in Multi-Dimensional Data

Errors might not have the same consequences depending on the task at han...

Please sign up or login with your details

Forgot password? Click here to reset