SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

by   Bao Hieu Tran, et al.

In the last decades, scene text recognition has gained worldwide attention from both the academic community and actual users due to its importance in a wide range of applications. Despite achievements in optical character recognition, scene text recognition remains challenging due to inherent problems such as distortions or irregular layout. Most of the existing approaches mainly leverage recurrence or convolution-based neural networks. However, while recurrent neural networks (RNNs) usually suffer from slow training speed due to sequential computation and encounter problems as vanishing gradient or bottleneck, CNN endures a trade-off between complexity and performance. In this paper, we introduce SAFL, a self-attention-based neural network model with the focal loss for scene text recognition, to overcome the limitation of the existing approaches. The use of focal loss instead of negative log-likelihood helps the model focus more on low-frequency samples training. Moreover, to deal with the distortions and irregular texts, we exploit Spatial TransformerNetwork (STN) to rectify text before passing to the recognition network. We perform experiments to compare the performance of the proposed model with seven benchmarks. The numerical results show that our model achieves the best performance.


page 1

page 2

page 3

page 4


MASTER: Multi-Aspect Non-local Network for Scene Text Recognition

Attention based scene text recognizers have gained huge success, which l...

On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention

Scene text recognition (STR) is the task of recognizing character sequen...

GTC: Guided Training of CTC Towards Efficient and Accurate Scene Text Recognition

Connectionist Temporal Classification (CTC) and attention mechanism are ...

A Simple and Robust Convolutional-Attention Network for Irregular Text Recognition

Reading irregular text of arbitrary shape in natural scene images is sti...

Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition

Recognizing irregular text in natural scene images is challenging due to...

Improving Long Handwritten Text Line Recognition with Convolutional Multi-way Associative Memory

Convolutional Recurrent Neural Networks (CRNNs) excel at scene text reco...

Recurrent Calibration Network for Irregular Text Recognition

Scene text recognition has received increased attention in the research ...

Please sign up or login with your details

Forgot password? Click here to reset