How to Choose Pretrained Handwriting Recognition Models for Single Writer Fine-Tuning

05/04/2023
by   Vittorio Pippi, et al.
0

Recent advancements in Deep Learning-based Handwritten Text Recognition (HTR) have led to models with remarkable performance on both modern and historical manuscripts in large benchmark datasets. Nonetheless, those models struggle to obtain the same performance when applied to manuscripts with peculiar characteristics, such as language, paper support, ink, and author handwriting. This issue is very relevant for valuable but small collections of documents preserved in historical archives, for which obtaining sufficient annotated training data is costly or, in some cases, unfeasible. To overcome this challenge, a possible solution is to pretrain HTR models on large datasets and then fine-tune them on small single-author collections. In this paper, we take into account large, real benchmark datasets and synthetic ones obtained with a styled Handwritten Text Generation model. Through extensive experimental analysis, also considering the amount of fine-tuning lines, we give a quantitative indication of the most relevant characteristics of such data for obtaining an HTR model able to effectively transcribe manuscripts in small collections with as little as five real fine-tuning lines.

READ FULL TEXT

page 5

page 9

research
12/04/2020

Boosting offline handwritten text recognition in historical documents with few labeled lines

In this paper, we face the problem of offline handwritten text recogniti...
research
03/10/2023

Marginalia and machine learning: Handwritten text recognition for Marginalia Collections

The pressing need for digitization of historical document collections ha...
research
08/16/2022

The LAM Dataset: A Novel Benchmark for Line-Level Handwritten Text Recognition

Handwritten Text Recognition (HTR) is an open problem at the intersectio...
research
03/07/2019

Discovering Visual Patterns in Art Collections with Spatially-consistent Feature Learning

Our goal in this paper is to discover near duplicate patterns in large c...
research
09/21/2022

A Few Shot Multi-Representation Approach for N-gram Spotting in Historical Manuscripts

Despite recent advances in automatic text recognition, the performance r...
research
03/06/2023

ST-KeyS: Self-Supervised Transformer for Keyword Spotting in Historical Handwritten Documents

Keyword spotting (KWS) in historical documents is an important tool for ...
research
04/19/2019

A Scalable Handwritten Text Recognition System

Many studies on (Offline) Handwritten Text Recognition (HTR) systems hav...

Please sign up or login with your details

Forgot password? Click here to reset