Self-Training of Handwritten Word Recognition for Synthetic-to-Real Adaptation

06/07/2022
by   Fabian Wolf, et al.
0

Performances of Handwritten Text Recognition (HTR) models are largely determined by the availability of labeled and representative training samples. However, in many application scenarios labeled samples are scarce or costly to obtain. In this work, we propose a self-training approach to train a HTR model solely on synthetic samples and unlabeled data. The proposed training scheme uses an initial model trained on synthetic data to make predictions for the unlabeled target dataset. Starting from this initial model with rather poor performance, we show that a considerable adaptation is possible by training against the predicted pseudo-labels. Moreover, the investigated self-training strategy does not require any manually annotated training samples. We evaluate the proposed method on four widely used benchmark datasets and show its effectiveness on closing the gap to a model trained in a fully-supervised manner.

READ FULL TEXT
research
05/31/2023

Improving Handwritten OCR with Training Samples Generated by Glyph Conditional Denoising Diffusion Probabilistic Model

Constructing a highly accurate handwritten OCR system requires large amo...
research
03/04/2020

Annotation-free Learning of Deep Representations for Word Spotting using Synthetic Data and Self Labeling

Word spotting is a popular tool for supporting the first exploration of ...
research
02/07/2018

VISER: Visual Self-Regularization

In this work, we propose the use of large set of unlabeled images as a s...
research
03/07/2021

What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels

Scene text recognition (STR) task has a common practice: All state-of-th...
research
03/15/2020

Beyond without Forgetting: Multi-Task Learning for Classification with Disjoint Datasets

Multi-task Learning (MTL) for classification with disjoint datasets aims...
research
09/04/2018

Modeling Surface Appearance from a Single Photograph using Self-augmented Convolutional Neural Networks

We present a convolutional neural network (CNN) based solution for model...
research
08/29/2023

Is it an i or an l: Test-time Adaptation of Text Line Recognition Models

Recognizing text lines from images is a challenging problem, especially ...

Please sign up or login with your details

Forgot password? Click here to reset