Convolutional Neural Networks for Font Classification

08/11/2017
by   Chris Tensmeyer, et al.
0

Classifying pages or text lines into font categories aids transcription because single font Optical Character Recognition (OCR) is generally more accurate than omni-font OCR. We present a simple framework based on Convolutional Neural Networks (CNNs), where a CNN is trained to classify small patches of text into predefined font classes. To classify page or line images, we average the CNN predictions over densely extracted patches. We show that this method achieves state-of-the-art performance on a challenging dataset of 40 Arabic computer fonts with 98.8% line level accuracy. This same method also achieves the highest reported accuracy of 86.6 scribal script classes at the page level on medieval Latin manuscripts. Finally, we analyze what features are learned by the CNN on Latin manuscripts and find evidence that the CNN is learning both the defining morphological differences between scribal script classes as well as overfitting to class-correlated nuisance factors. We propose a novel form of data augmentation that improves robustness to text darkness, further increasing classification performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/28/2020

Convolutional Neural Networks in Multi-Class Classification of Medical Data

We report applications of Convolutional Neural Networks (CNN) to multi-c...
research
09/22/2020

TSV Extrusion Morphology Classification Using Deep Convolutional Neural Networks

In this paper, we utilize deep convolutional neural networks (CNNs) to c...
research
10/14/2018

Fine-Grained Classification of Cervical Cells Using Morphological and Appearance Based Convolutional Neural Networks

Fine-grained classification of cervical cells into different abnormality...
research
10/09/2017

Page Stream Segmentation with Convolutional Neural Nets Combining Textual and Visual Features

For digitization of paper files via OCR, preservation of document contex...
research
07/04/2021

Robust End-to-End Offline Chinese Handwriting Text Page Spotter with Text Kernel

Offline Chinese handwriting text recognition is a long-standing research...
research
02/27/2018

Convolutional Neural Networks for Toxic Comment Classification

Flood of information is produced in a daily basis through the global Int...

Please sign up or login with your details

Forgot password? Click here to reset