Outlier Detection as Instance Selection Method for Feature Selection in Time Series Classification

11/16/2021
by   David Cemernek, et al.
0

In order to allow machine learning algorithms to extract knowledge from raw data, these data must first be cleaned, transformed, and put into machine-appropriate form. These often very time-consuming phase is referred to as preprocessing. An important step in the preprocessing phase is feature selection, which aims at better performance of prediction models by reducing the amount of features of a data set. Within these datasets, instances of different events are often imbalanced, which means that certain normal events are over-represented while other rare events are very limited. Typically, these rare events are of special interest since they have more discriminative power than normal events. The aim of this work was to filter instances provided to feature selection methods for these rare instances, and thus positively influence the feature selection process. In the course of this work, we were able to show that this filtering has a positive effect on the performance of classification models and that outlier detection methods are suitable for this filtering. For some data sets, the resulting increase in performance was only a few percent, but for other datasets, we were able to achieve increases in performance of up to 16 percent. This work should lead to the improvement of the predictive models and the better interpretability of feature selection in the course of the preprocessing phase. In the spirit of open science and to increase transparency within our research field, we have made all our source code and the results of our experiments available in a publicly available repository.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2022

Synthetic Data for Feature Selection

Feature selection is an important and active field of research in machin...
research
07/12/2018

The Impact of Feature Selection on Predicting the Number of Bugs

Bug prediction is the process of training a machine learning model on so...
research
07/09/2020

Probabilistic Value Selection for Space Efficient Model

An alternative to current mainstream preprocessing methods is proposed: ...
research
04/19/2018

A comparative study of feature selection methods for stress hotspot classification in materials

The first step in constructing a machine learning model is defining the ...
research
06/15/2023

A Hybrid Feature Selection and Construction Method for Detection of Wind Turbine Generator Heating Faults

Preprocessing of information is an essential step for the effective desi...
research
07/21/2023

Finding Optimal Diverse Feature Sets with Alternative Feature Selection

Feature selection is popular for obtaining small, interpretable, yet hig...
research
08/28/2023

Causality-Based Feature Importance Quantifying Methods:PN-FI, PS-FI and PNS-FI

In current ML field models are getting larger and more complex, data we ...

Please sign up or login with your details

Forgot password? Click here to reset