End-to-End Subtitle Detection and Recognition for Videos in East Asian Languages via CNN Ensemble with Near-Human-Level Performance

11/18/2016
by   Yan Xu, et al.
0

In this paper, we propose an innovative end-to-end subtitle detection and recognition system for videos in East Asian languages. Our end-to-end system consists of multiple stages. Subtitles are firstly detected by a novel image operator based on the sequence information of consecutive video frames. Then, an ensemble of Convolutional Neural Networks (CNNs) trained on synthetic data is adopted for detecting and recognizing East Asian characters. Finally, a dynamic programming approach leveraging language models is applied to constitute results of the entire body of text lines. The proposed system achieves average end-to-end accuracies of 98.2 Simplified Chinese and 40 videos in Traditional Chinese respectively, which is a significant outperformance of other existing methods. The near-perfect accuracy of our system dramatically narrows the gap between human cognitive ability and state-of-the-art algorithms used for such a task.

READ FULL TEXT

page 3

page 12

page 18

page 25

page 34

research
07/04/2021

Robust End-to-End Offline Chinese Handwriting Text Page Spotter with Text Kernel

Offline Chinese handwriting text recognition is a long-standing research...
research
06/21/2016

Drawing and Recognizing Chinese Characters with Recurrent Neural Network

Recent deep learning based approaches have achieved great success on han...
research
03/29/2021

A Multiplexed Network for End-to-End, Multilingual OCR

Recent advances in OCR have shown that an end-to-end (E2E) training pipe...
research
12/02/2018

End-to-end Learning of Convolutional Neural Net and Dynamic Programming for Left Ventricle Segmentation

Differentiable programming is able to combine different functions or pro...
research
09/12/2017

End-to-End United Video Dehazing and Detection

The recent development of CNN-based image dehazing has revealed the effe...
research
12/20/2019

TentacleNet: A Pseudo-Ensemble Template for Accurate Binary Convolutional Neural Networks

Binarization is an attractive strategy for implementing lightweight Deep...
research
03/17/2017

DropRegion Training of Inception Font Network for High-Performance Chinese Font Recognition

Chinese font recognition (CFR) has gained significant attention in recen...

Please sign up or login with your details

Forgot password? Click here to reset