Sequence to Sequence Learning for Optical Character Recognition

by   Devendra Kumar Sahu, et al.

We propose an end-to-end recurrent encoder-decoder based sequence learning approach for printed text Optical Character Recognition (OCR). In contrast to present day existing state-of-art OCR solution which uses connectionist temporal classification (CTC) output layer, our approach makes minimalistic assumptions on the structure and length of the sequence. We use a two step encoder-decoder approach -- (a) A recurrent encoder reads a variable length printed text word image and encodes it to a fixed dimensional embedding. (b) This fixed dimensional embedding is subsequently comprehended by decoder structure which converts it into a variable length text output. Our architecture gives competitive performance relative to connectionist temporal classification (CTC) output layer while being executed in more natural settings. The learnt deep word image embedding from encoder can be used for printed text based retrieval systems. The expressive fixed dimensional embedding for any variable length input expedites the task of retrieval and makes it more efficient which is not possible with other recurrent neural network architectures. We empirically investigate the expressiveness and the learnability of long short term memory (LSTMs) in the sequence to sequence learning regime by training our network for prediction tasks in segmentation free printed text OCR. The utility of the proposed architecture for printed text is demonstrated by quantitative and qualitative evaluation of two tasks -- word prediction and retrieval.


page 1

page 2

page 3

page 4


Neural Machine Transliteration: Preliminary Results

Machine transliteration is the process of automatically transforming the...

Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition

Encoder-decoder models have become an effective approach for sequence le...

An End-to-End Khmer Optical Character Recognition using Sequence-to-Sequence with Attention

This paper presents an end-to-end deep convolutional recurrent neural ne...

Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes

In contrast to Connectionist Temporal Classification (CTC) approaches, S...

SeqSleepNet: End-to-End Hierarchical Recurrent Neural Network for Sequence-to-Sequence Automatic Sleep Staging

Automatic sleep staging has been often treated as a simple classificatio...

An Encoder-Decoder Model for ICD-10 Coding of Death Certificates

Information extraction from textual documents such as hospital records a...

Incorporating Copying Mechanism in Sequence-to-Sequence Learning

We address an important problem in sequence-to-sequence (Seq2Seq) learni...

Please sign up or login with your details

Forgot password? Click here to reset