On Cropped versus Uncropped Training Sets in Tabular Structure Detection

10/06/2021
by   Yakup Akkaya, et al.
0

Automated document processing for tabular information extraction is highly desired in many organizations, from industry to government. Prior works have addressed this problem under table detection and table structure detection tasks. Proposed solutions leveraging deep learning approaches have been giving promising results in these tasks. However, the impact of dataset structures on table structure detection has not been investigated. In this study, we provide a comparison of table structure detection performance with cropped and uncropped datasets. The cropped set consists of only table images that are cropped from documents assuming tables are detected perfectly. The uncropped set consists of regular document images. Experiments show that deep learning models can improve the detection performance by up to 9 and average recall on the cropped versions. Furthermore, the impact of cropped images is negligible under the Intersection over Union (IoU) values of 50 when compared to the uncropped versions. However, beyond 70 cropped datasets provide significantly higher detection performance.

READ FULL TEXT

page 7

page 20

research
01/06/2020

TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images

With the widespread use of mobile phones and scanners to photograph and ...
research
05/05/2023

Optimized Table Tokenization for Table Structure Recognition

Extracting tables from documents is a crucial task in any document conve...
research
12/12/2019

The Benefits of Close-Domain Fine-Tuning for Table Detection in Document Images

A correct localisation of tables in a document is instrumental for deter...
research
04/29/2021

Current Status and Performance Analysis of Table Recognition in Document Images with Deep Neural Networks

The first phase of table recognition is to detect the tabular area in a ...
research
08/26/2020

Tabular Structure Detection from Document Images for Resource Constrained Devices Using A Row Based Similarity Measure

Tabular structures are used to present crucial information in a structur...
research
06/29/2015

Detecting Table Region in PDF Documents Using Distant Supervision

Superior to state-of-the-art approaches which compete in table recogniti...
research
02/16/2021

TableLab: An Interactive Table Extraction System with Adaptive Deep Learning

Table extraction from PDF and image documents is a ubiquitous task in th...

Please sign up or login with your details

Forgot password? Click here to reset