Full Page Handwriting Recognition via Image to Sequence Extraction

by   Sumeet S. Singh, et al.

We present a Neural Network based Handwritten Text Recognition (HTR) model architecture that can be trained to recognize full pages of handwritten or printed text without image segmentation. Being based on an Image to Sequence architecture, it can be trained to extract text present in an image and sequence it correctly without imposing any constraints on language, shape of characters or orientation and layout of text and non-text. The model can also be trained to generate auxiliary markup related to formatting, layout and content. We use character level token vocabulary, thereby supporting proper nouns and terminology of any subject. The model achieves a new state-of-art in full page recognition on the IAM dataset and when evaluated on scans of real world handwritten free form test answers - a dataset beset with curved and slanted lines, drawings, tables, math, chemistry and other symbols - it performs better than all commercially available HTR APIs. It is deployed in production as part of a commercial web application.


page 1

page 2

page 3

page 4


DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition

Unconstrained handwritten document recognition is a challenging computer...

Recognition of Handwritten Roman Script Using Tesseract Open source OCR Engine

In the present work, we have used Tesseract 2.01 open source Optical Cha...

Boosting Modern and Historical Handwritten Text Recognition with Deformable Convolutions

Handwritten Text Recognition (HTR) in free-layout pages is a challenging...

Towards End-to-end Handwritten Document Recognition

Handwritten text recognition has been widely studied in the last decades...

VML-MOC: Segmenting a multiply oriented and curved handwritten text lines dataset

This paper publishes a natural and very complicated dataset of handwritt...

AKHCRNet: Bengali Handwritten Character Recognition Using Deep Learning

I propose a state of the art deep neural architectural solution for hand...

Handwritten Stenography Recognition and the LION Dataset

Purpose: In this paper, we establish a baseline for handwritten stenogra...

Please sign up or login with your details

Forgot password? Click here to reset