From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network

08/22/2021
by   Yuxin Wang, et al.
0

In this paper, we abandon the dominant complex language model and rethink the linguistic learning process in the scene text recognition. Different from previous methods considering the visual and linguistic information in two separate structures, we propose a Visual Language Modeling Network (VisionLAN), which views the visual and linguistic information as a union by directly enduing the vision model with language capability. Specially, we introduce the text recognition of character-wise occluded feature maps in the training stage. Such operation guides the vision model to use not only the visual texture of characters, but also the linguistic information in visual context for recognition when the visual cues are confused (e.g. occlusion, noise, etc.). As the linguistic information is acquired along with visual features without the need of extra language model, VisionLAN significantly improves the speed by 39 and adaptively considers the linguistic information to enhance the visual features for accurate recognition. Furthermore, an Occlusion Scene Text (OST) dataset is proposed to evaluate the performance on the case of missing character-wise visual cues. The state of-the-art results on several benchmarks prove our effectiveness. Code and dataset are available at https://github.com/wangyuxin87/VisionLAN.

READ FULL TEXT
research
05/09/2023

Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition

Vision model have gained increasing attention due to their simplicity an...
research
05/08/2023

Scene Text Recognition with Image-Text Matching-guided Dictionary

Employing a dictionary can efficiently rectify the deviation between the...
research
08/03/2020

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

Scene text spotting aims to detect and recognize the entire word or sent...
research
04/06/2022

IterVM: Iterative Vision Modeling Module for Scene Text Recognition

Scene text recognition (STR) is a challenging problem due to the imperfe...
research
11/30/2021

Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features

Linguistic knowledge has brought great benefits to scene text recognitio...
research
04/05/2023

Efficient OCR for Building a Diverse Digital History

Thousands of users consult digital archives daily, but the information t...
research
04/12/2022

Open-set Text Recognition via Character-Context Decoupling

The open-set text recognition task is an emerging challenge that require...

Please sign up or login with your details

Forgot password? Click here to reset