Improving Data Quality through Deep Learning and Statistical Models

by   Wei Dai, et al.

Traditional data quality control methods are based on users experience or previously established business rules, and this limits performance in addition to being a very time consuming process with lower than desirable accuracy. Utilizing deep learning, we can leverage computing resources and advanced techniques to overcome these challenges and provide greater value to users. In this paper, we, the authors, first review relevant works and discuss machine learning techniques, tools, and statistical quality models. Second, we offer a creative data quality framework based on deep learning and statistical model algorithm for identifying data quality. Third, we use data involving salary levels from an open dataset published by the state of Arkansas to demonstrate how to identify outlier data and how to improve data quality via deep learning. Finally, we discuss future work.


page 3

page 5


Deep Learning for Singing Processing: Achievements, Challenges and Impact on Singers and Listeners

This paper summarizes some recent advances on a set of tasks related to ...

SmartBullets: A Cloud-Assisted Bullet Screen Filter based on Deep Learning

Bullet-screen is a technique that enables the website users to send real...

Deep Learning for Iris Recognition: A Survey

In this survey, we provide a comprehensive review of more than 200 paper...

Communication-Efficient Distributed Deep Learning: Survey, Evaluation, and Challenges

In recent years, distributed deep learning techniques are widely deploye...

Statistical learning for accurate and interpretable battery lifetime prediction

Data-driven methods for battery lifetime prediction are attracting incre...

The future of statistical disclosure control

Statistical disclosure control (SDC) was not created in a single seminal...

Please sign up or login with your details

Forgot password? Click here to reset