HDIB1M – Handwritten Document Image Binarization 1 Million Dataset

01/27/2021
by   Kaustubh Sadekar, et al.
20

Handwritten document image binarization is a challenging task due to high diversity in the content, page style, and condition of the documents. While the traditional thresholding methods fail to generalize on such challenging scenarios, deep learning based methods can generalize well however, require a large training data. Current datasets for handwritten document image binarization are limited in size and fail to represent several challenging real-world scenarios. To solve this problem, we propose HDIB1M - a handwritten document image binarization dataset of 1M images. We also present a novel method used to generate this dataset. To show the effectiveness of our dataset we train a deep learning model UNetED on our dataset and evaluate its performance on other publicly available datasets. The dataset and the code will be made available to the community.

READ FULL TEXT

page 3

page 4

research
02/12/2022

Recognition-free Question Answering on Handwritten Document Collections

In recent years, considerable progress has been made in the research are...
research
01/19/2021

Unsupervised Deep Learning for Handwritten Page Segmentation

Segmenting handwritten document images into regions with homogeneous pat...
research
09/05/2017

PageNet: Page Boundary Extraction in Historical Handwritten Documents

When digitizing a document into an image, it is common to include a surr...
research
01/10/2019

New Radon Transform Based Texture Features of Handwritten Document

In this paper, we present some new features describing the handwritten d...
research
10/27/2022

Efficient few-shot learning for pixel-precise handwritten document layout analysis

Layout analysis is a task of uttermost importance in ancient handwritten...
research
09/06/2017

Automatic Document Image Binarization using Bayesian Optimization

Document image binarization is often a challenging task due to various f...
research
05/31/2020

Modified Segmentation Algorithm for Recognition of Older Geez Scripts Written on Vellum

Recognition of handwritten document aims at transforming document images...

Please sign up or login with your details

Forgot password? Click here to reset