Nonlinear Feature Aggregation: Two Algorithms driven by Theory

06/19/2023
by   Paolo Bonetti, et al.
0

Many real-world machine learning applications are characterized by a huge number of features, leading to computational and memory issues, as well as the risk of overfitting. Ideally, only relevant and non-redundant features should be considered to preserve the complete information of the original data and limit the dimensionality. Dimensionality reduction and feature selection are common preprocessing techniques addressing the challenge of efficiently dealing with high-dimensional data. Dimensionality reduction methods control the number of features in the dataset while preserving its structure and minimizing information loss. Feature selection aims to identify the most relevant features for a task, discarding the less informative ones. Previous works have proposed approaches that aggregate features depending on their correlation without discarding any of them and preserving their interpretability through aggregation with the mean. A limitation of methods based on correlation is the assumption of linearity in the relationship between features and target. In this paper, we relax such an assumption in two ways. First, we propose a bias-variance analysis for general models with additive Gaussian noise, leading to a dimensionality reduction algorithm (NonLinCFA) which aggregates non-linear transformations of features with a generic aggregation function. Then, we extend the approach assuming that a generalized linear model regulates the relationship between features and target. A deviance analysis leads to a second dimensionality reduction algorithm (GenLinCFA), applicable to a larger class of regression problems and classification settings. Finally, we test the algorithms on synthetic and real-world datasets, performing regression and classification tasks, showing competitive performances.

READ FULL TEXT
research
03/26/2023

Interpretable Linear Dimensionality Reduction based on Bias-Variance Analysis

One of the central issues of several machine learning applications on re...
research
10/13/2016

An Information Theoretic Feature Selection Framework for Big Data under Apache Spark

With the advent of extremely high dimensional datasets, dimensionality r...
research
01/26/2023

SparCA: Sparse Compressed Agglomeration for Feature Extraction and Dimensionality Reduction

The most effective dimensionality reduction procedures produce interpret...
research
06/15/2018

Supervised Fuzzy Partitioning

Centroid-based methods including k-means and fuzzy c-means (FCM) are kno...
research
03/16/2023

Evaluation of distance-based approaches for forensic comparison: Application to hand odor evidence

The issue of distinguishing between the same-source and different-source...
research
10/05/2019

Covariance-free Partial Least Squares: An Incremental Dimensionality Reduction Method

Dimensionality reduction plays an important role in computer vision prob...
research
12/14/2020

Recovery of Linear Components: Reduced Complexity Autoencoder Designs

Reducing dimensionality is a key preprocessing step in many data analysi...

Please sign up or login with your details

Forgot password? Click here to reset