Ensembling improves stability and power of feature selection for deep learning models

10/02/2022
by   Prashnna K Gyawali, et al.
11

With the growing adoption of deep learning models in different real-world domains, including computational biology, it is often necessary to understand which data features are essential for the model's decision. Despite extensive recent efforts to define different feature importance metrics for deep learning models, we identified that inherent stochasticity in the design and training of deep learning models makes commonly used feature importance scores unstable. This results in varied explanations or selections of different features across different runs of the model. We demonstrate how the signal strength of features and correlation among features directly contribute to this instability. To address this instability, we explore the ensembling of feature importance scores of models across different epochs and find that this simple approach can substantially address this issue. For example, we consider knockoff inference as they allow feature selection with statistical guarantees. We discover considerable variability in selected features in different epochs of deep learning training, and the best selection of features doesn't necessarily occur at the lowest validation loss, the conventional approach to determine the best model. As such, we present a framework to combine the feature importance of trained models across different hyperparameter settings and epochs, and instead of selecting features from one best model, we perform an ensemble of feature importance scores from numerous good models. Across the range of experiments in simulated and various real-world datasets, we demonstrate that the proposed framework consistently improves the power of feature selection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/25/2022

Deep Feature Selection Using a Novel Complementary Feature Mask

Feature selection has drawn much attention over the last decades in mach...
research
11/22/2018

Feature Selection for Survival Analysis with Competing Risks using Deep Learning

Deep learning models for survival analysis have gained significant atten...
research
12/04/2019

Forward and Backward Feature Selection for Query Performance Prediction

The goal of query performance prediction (QPP) is to automatically estim...
research
10/29/2021

Holistic Deep Learning

There is much interest in deep learning to solve challenges that arise i...
research
09/29/2022

Feature Selection via the Intervened Interpolative Decomposition and its Application in Diversifying Quantitative Strategies

In this paper, we propose a probabilistic model for computing an interpo...
research
06/16/2022

Inherent Inconsistencies of Feature Importance

The black-box nature of modern machine learning techniques invokes a pra...
research
08/28/2023

Causality-Based Feature Importance Quantifying Methods:PN-FI, PS-FI and PNS-FI

In current ML field models are getting larger and more complex, data we ...

Please sign up or login with your details

Forgot password? Click here to reset