Curvature-based Feature Selection with Application in Classifying Electronic Health Records

01/10/2021
by   Zheming Zuo, et al.
10

Electronic Health Records (EHRs) are widely applied in healthcare facilities nowadays. Due to the inherent heterogeneity, unbalanced, incompleteness, and high-dimensional nature of EHRs, it is a challenging task to employ machine learning algorithms to analyse such EHRs for prediction and diagnostics within the scope of precision medicine. Dimensionality reduction is an efficient data preprocessing technique for the analysis of high dimensional data that reduces the number of features while improving the performance of the data analysis, e.g. classification. In this paper, we propose an efficient curvature-based feature selection method for supporting more precise diagnosis. The proposed method is a filter-based feature selection method, which directly utilises the Menger Curvature for ranking all the attributes in the given data set. We evaluate the performance of our method against conventional PCA and recent ones including BPCM, GSAM, WCNN, BLS II, VIBES, 2L-MJFA, RFGA, and VAF. Our method achieves state-of-the-art performance on four benchmark healthcare data sets including CCRFDS, BCCDS, BTDS, and DRDDS with impressive 24.73 improvements respectively on BTDS and CCRFDS, 7.97 3.63 https://github.com/zhemingzuo/CFS.

READ FULL TEXT

page 3

page 5

page 9

research
10/19/2021

Identifying Stroke Indicators Using Rough Sets

Stroke is widely considered as the second most common cause of mortality...
research
09/29/2021

A Study of Feature Selection and Extraction Algorithms for Cancer Subtype Prediction

In this work, we study and analyze different feature selection algorithm...
research
12/02/2018

Feature Selection Based on Unique Relevant Information for Health Data

Feature selection, which searches for the most representative features i...
research
05/25/2020

Feature Robust Optimal Transport for High-dimensional Data

Optimal transport is a machine learning technique with applications incl...
research
01/28/2012

Feature selection using nearest attributes

Feature selection is an important problem in high-dimensional data analy...
research
07/11/2023

Learning Active Subspaces and Discovering Important Features with Gaussian Radial Basis Functions Neural Networks

Providing a model that achieves a strong predictive performance and at t...
research
07/10/2017

Learning in High-Dimensional Multimedia Data: The State of the Art

During the last decade, the deluge of multimedia data has impacted a wid...

Please sign up or login with your details

Forgot password? Click here to reset