Consistent Nonparametric Different-Feature Selection via the Sparsest k-Subgraph Problem

07/31/2017
by   Satoshi Hara, et al.
0

Two-sample feature selection is the problem of finding features that describe a difference between two probability distributions, which is a ubiquitous problem in both scientific and engineering studies. However, existing methods have limited applicability because of their restrictive assumptions on data distributoins or computational difficulty. In this paper, we resolve these difficulties by formulating the problem as a sparsest k-subgraph problem. The proposed method is nonparametric and does not assume any specific parametric models on the data distributions. We show that the proposed method is computationally efficient and does not require any extra computation for model selection. Moreover, we prove that the proposed method provides a consistent estimator of features under mild conditions. Our experimental results show that the proposed method outperforms the current method with regard to both accuracy and computation time.

READ FULL TEXT
research
02/15/2023

A model-free feature selection technique of feature screening and random forest based recursive feature elimination

In this paper, we propose a model-free feature selection method for ultr...
research
03/30/2021

A General Framework of Nonparametric Feature Selection in High-Dimensional Data

Nonparametric feature selection in high-dimensional data is an important...
research
06/01/2022

Feature Selection for Discovering Distributional Treatment Effect Modifiers

Finding the features relevant to the difference in treatment effects is ...
research
10/09/2019

Supervised feature selection with orthogonal regression and feature weighting

Effective features can improve the performance of a model, which can thu...
research
10/11/2021

Privacy-Preserving Multiparty Protocol for Feature Selection Problem

In this paper, we propose a secure multiparty protocol for the feature s...
research
02/19/2019

An entropic feature selection method in perspective of Turing formula

Health data are generally complex in type and small in sample size. Such...
research
02/01/2020

On the Consistency of Optimal Bayesian Feature Selection in the Presence of Correlations

Optimal Bayesian feature selection (OBFS) is a multivariate supervised s...

Please sign up or login with your details

Forgot password? Click here to reset