Self-supervised Deep Reconstruction of Mixed Strip-shredded Text Documents

07/01/2020
by   Thiago M. Paixão, et al.
0

The reconstruction of shredded documents consists of coherently arranging fragments of paper (shreds) to recover the original document(s). A great challenge in computational reconstruction is to properly evaluate the compatibility between the shreds. While traditional pixel-based approaches are not robust to real shredding, more sophisticated solutions compromise significantly time performance. The solution presented in this work extends our previous deep learning method for single-page reconstruction to a more realistic/complex scenario: the reconstruction of several mixed shredded documents at once. In our approach, the compatibility evaluation is modeled as a two-class (valid or invalid) pattern recognition problem. The model is trained in a self-supervised manner on samples extracted from simulated-shredded documents, which obviates manual annotation. Experimental results on three datasets – including a new collection of 100 strip-shredded documents produced for this work – have shown that the proposed method outperforms the competing ones on complex scenarios, achieving accuracy superior to 90

READ FULL TEXT

page 16

page 18

research
03/23/2020

Fast(er) Reconstruction of Shredded Text Documents via Self-Supervised Deep Asymmetric Metric Learning

The reconstruction of shredded documents consists in arranging the piece...
research
05/30/2023

Analyzing the Sample Complexity of Self-Supervised Image Reconstruction Methods

Supervised training of deep neural networks on pairs of clean image and ...
research
06/02/2021

Self-Supervised Document Similarity Ranking via Contextualized Language Models and Hierarchical Inference

We present a novel model for the problem of ranking a collection of docu...
research
02/16/2021

Large-Context Conversational Representation Learning: Self-Supervised Learning for Conversational Documents

This paper presents a novel self-supervised learning method for handling...
research
11/04/2020

Handwriting Classification for the Analysis of Art-Historical Documents

Digitized archives contain and preserve the knowledge of generations of ...
research
07/03/2020

Noise2Filter: fast, self-supervised learning and real-time reconstruction for 3D Computed Tomography

At X-ray beamlines of synchrotron light sources, the achievable time-res...
research
05/26/2022

AI for Porosity and Permeability Prediction from Geologic Core X-Ray Micro-Tomography

Geologic cores are rock samples that are extracted from deep under the g...

Please sign up or login with your details

Forgot password? Click here to reset