TextCaps : Handwritten Character Recognition with Very Small Datasets

04/17/2019
by   Vinoj Jayasundara, et al.
0

Many localized languages struggle to reap the benefits of recent advancements in character recognition systems due to the lack of substantial amount of labeled training data. This is due to the difficulty in generating large amounts of labeled data for such languages and inability of deep learning techniques to properly learn from small number of training samples. We solve this problem by introducing a technique of generating new training samples from the existing samples, with realistic augmentations which reflect actual variations that are present in human hand writing, by adding random controlled noise to their corresponding instantiation parameters. Our results with a mere 200 training samples per class surpass existing character recognition results in the EMNIST-letter dataset while achieving the existing results in the three datasets: EMNIST-balanced, EMNIST-digits, and MNIST. We also develop a strategy to effectively use a combination of loss functions to improve reconstructions. Our system is useful in character recognition for localized languages that lack much labeled training data and even in other related more general contexts such as object recognition.

READ FULL TEXT

page 3

page 7

page 8

research
04/08/2020

MNIST-MIX: A Multi-language Handwritten Digit Recognition Dataset

In this letter, we contribute a multi-language handwritten digit recogni...
research
05/25/2023

Zero-shot Generation of Training Data with Denoising Diffusion Probabilistic Model for Handwritten Chinese Character Recognition

There are more than 80,000 character categories in Chinese while most of...
research
01/02/2018

Learning audio and image representations with bio-inspired trainable feature extractors

Recent advancements in pattern recognition and signal processing concern...
research
08/25/2018

How many labeled license plates are needed?

Training a good deep learning model often requires a lot of annotated da...
research
05/31/2023

Improving Handwritten OCR with Training Samples Generated by Glyph Conditional Denoising Diffusion Probabilistic Model

Constructing a highly accurate handwritten OCR system requires large amo...
research
06/22/2020

Text Recognition in Real Scenarios with a Few Labeled Samples

Scene text recognition (STR) is still a hot research topic in computer v...
research
01/01/2016

Discriminative Sparsity for Sonar ATR

Advancements in Sonar image capture have enabled researchers to apply so...

Please sign up or login with your details

Forgot password? Click here to reset