Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes

10/12/2021
by   Christoph Wick, et al.
0

In contrast to Connectionist Temporal Classification (CTC) approaches, Sequence-To-Sequence (S2S) models for Handwritten Text Recognition (HTR) suffer from errors such as skipped or repeated words which often occur at the end of a sequence. In this paper, to combine the best of both approaches, we propose to use the CTC-Prefix-Score during S2S decoding. Hereby, during beam search, paths that are invalid according to the CTC confidence matrix are penalised. Our network architecture is composed of a Convolutional Neural Network (CNN) as visual backbone, bidirectional Long-Short-Term-Memory-Cells (LSTMs) as encoder, and a decoder which is a Transformer with inserted mutual attention layers. The CTC confidences are computed on the encoder while the Transformer is only used for character-wise S2S decoding. We evaluate this setup on three HTR data sets: IAM, Rimes, and StAZH. On IAM, we achieve a competitive Character Error Rate (CER) of 2.95 character-based language model for contemporary English. Compared to other state-of-the-art approaches, our model requires about 10-20 times less parameters. Access our shared implementations via this link to GitHub: https://github.com/Planet-AI-GmbH/tfaip-hybrid-ctc-s2s.

READ FULL TEXT
research
03/18/2019

Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition

Encoder-decoder models have become an effective approach for sequence le...
research
11/13/2015

Sequence to Sequence Learning for Optical Character Recognition

We propose an end-to-end recurrent encoder-decoder based sequence learni...
research
06/21/2021

An End-to-End Khmer Optical Character Recognition using Sequence-to-Sequence with Attention

This paper presents an end-to-end deep convolutional recurrent neural ne...
research
12/06/2019

Synchronous Transformers for End-to-End Speech Recognition

For most of the attention-based sequence-to-sequence models, the decoder...
research
11/09/2022

Pure Transformer with Integrated Experts for Scene Text Recognition

Scene text recognition (STR) involves the task of reading text in croppe...
research
07/20/2016

Sequence to sequence learning for unconstrained scene text recognition

In this work we present a state-of-the-art approach for unconstrained na...
research
04/12/2023

Adaptive Human Matting for Dynamic Videos

The most recent efforts in video matting have focused on eliminating tri...

Please sign up or login with your details

Forgot password? Click here to reset