Identifying Table Structure in Documents using Conditional Generative Adversarial Networks

01/13/2020
by   Nataliya Le Vine, et al.
0

In many industries, as well as in academic research, information is primarily transmitted in the form of unstructured documents (this article, for example). Hierarchically-related data is rendered as tables, and extracting information from tables in such documents presents a significant challenge. Many existing methods take a bottom-up approach, first integrating lines into cells, then cells into rows or columns, and finally inferring a structure from the resulting 2-D layout. But such approaches neglect the available prior information relating to table structure, namely that the table is merely an arbitrary representation of a latent logical structure. We propose a top-down approach, first using a conditional generative adversarial network to map a table image into a standardised `skeleton' table form denoting approximate row and column borders without table content, then deriving latent table structure using xy-cut projection and Genetic Algorithm optimisation. The approach is easily adaptable to different table configurations and requires small data set sizes for training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/03/2019

Extracting Tables from Documents using Conditional Generative Adversarial Networks and Genetic Algorithms

Extracting information from tables in documents presents a significant c...
research
11/14/2022

Row Conditional-TGAN for generating synthetic relational databases

Besides reproducing tabular data properties of standalone tables, synthe...
research
11/13/2021

Visual Understanding of Complex Table Structures from Document Images

Table structure recognition is necessary for a comprehensive understandi...
research
03/02/2022

TableFormer: Table Structure Understanding with Transformers

Tables organize valuable content in a concise and compact representation...
research
01/15/2019

Integrating and querying similar tables from PDF documents using deep learning

Large amount of public data produced by enterprises are in semi-structur...
research
02/20/2023

Compressing Tabular Data via Latent Variable Estimation

Data used for analytics and machine learning often take the form of tabl...
research
09/10/2021

Neural Networks for Latent Budget Analysis of Compositional Data

Compositional data are non-negative data collected in a rectangular matr...

Please sign up or login with your details

Forgot password? Click here to reset