DCoM: A Deep Column Mapper for Semantic Data Type Detection

06/24/2021
by   Subhadip Maji, et al.
0

Detection of semantic data types is a very crucial task in data science for automated data cleaning, schema matching, data discovery, semantic data type normalization and sensitive data identification. Existing methods include regular expression-based or dictionary lookup-based methods that are not robust to dirty as well unseen data and are limited to a very less number of semantic data types to predict. Existing Machine Learning methods extract large number of engineered features from data and build logistic regression, random forest or feedforward neural network for this purpose. In this paper, we introduce DCoM, a collection of multi-input NLP-based deep neural networks to detect semantic data types where instead of extracting large number of features from the data, we feed the raw values of columns (or instances) to the model as texts. We train DCoM on 686,765 data columns extracted from VizNet corpus with 78 different semantic data types. DCoM outperforms other contemporary results with a quite significant margin on the same dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2019

Sherlock: A Deep Learning Approach to Semantic Data Type Detection

Correctly detecting the semantic type of data columns is crucial for dat...
research
11/14/2019

Sato: Contextual Semantic Type Detection in Tables

Detecting the semantic types of data columns in relational tables is imp...
research
07/24/2023

Comprehending Semantic Types in JSON Data with Graph Neural Networks

Semantic types are a more powerful and detailed way of describing data t...
research
10/30/2020

Semantic Labeling Using a Deep Contextualized Language Model

Generating schema labels automatically for column values of data tables ...
research
12/15/2020

Semantic Annotation for Tabular Data

Detecting semantic concept of columns in tabular data is of particular i...
research
04/12/2022

A Machine Learning Approach to Determine the Semantic Versioning Type of npm Packages Releases

Semantic versioning policy is widely used to indicate the level of chang...
research
07/09/2021

Can Deep Neural Networks Predict Data Correlations from Column Names?

For humans, it is often possible to predict data correlations from colum...

Please sign up or login with your details

Forgot password? Click here to reset