DataAssist: A Machine Learning Approach to Data Cleaning and Preparation

07/14/2023
by   Kartikay Goyle, et al.
0

Current automated machine learning (ML) tools are model-centric, focusing on model selection and parameter optimization. However, the majority of the time in data analysis is devoted to data cleaning and wrangling, for which limited tools are available. Here we present DataAssist, an automated data preparation and cleaning platform that enhances dataset quality using ML-informed methods. We show that DataAssist provides a pipeline for exploratory data analysis and data cleaning, including generating visualization for user-selected variables, unifying data annotation, suggesting anomaly removal, and preprocessing data. The exported dataset can be readily integrated with other autoML tools or user-specified model for downstream analysis. Our data-centric tool is applicable to a variety of fields, including economics, business, and forecasting applications saving over 50 cleansing and preparation.

READ FULL TEXT
research
06/23/2022

STREAMLINE: A Simple, Transparent, End-To-End Automated Machine Learning Pipeline Facilitating Data Analysis and Algorithm Comparison

Machine learning (ML) offers powerful methods for detecting and modeling...
research
08/15/2019

Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools

There has been considerable growth and interest in industrial applicatio...
research
02/07/2023

A conceptual model for leaving the data-centric approach in machine learning

For a long time, machine learning (ML) has been seen as the abstract pro...
research
02/04/2020

A Generalized Flow for B2B Sales Predictive Modeling: An Azure Machine Learning Approach

Predicting sales opportunities outcome is a core to successful business ...
research
08/24/2023

Whombat: An open-source annotation tool for machine learning development in bioacoustics

1. Automated analysis of bioacoustic recordings using machine learning (...
research
03/04/2021

Analysing Wideband Absorbance Immittance in Normal and Ears with Otitis Media with Effusion Using Machine Learning

Wideband Absorbance Immittance (WAI) has been available for more than a ...
research
11/23/2021

AutoDC: Automated data-centric processing

AutoML (automated machine learning) has been extensively developed in th...

Please sign up or login with your details

Forgot password? Click here to reset