PBFormer: Capturing Complex Scene Text Shape with Polynomial Band Transformer

by   Ruijin Liu, et al.

We present PBFormer, an efficient yet powerful scene text detector that unifies the transformer with a novel text shape representation Polynomial Band (PB). The representation has four polynomial curves to fit a text's top, bottom, left, and right sides, which can capture a text with a complex shape by varying polynomial coefficients. PB has appealing features compared with conventional representations: 1) It can model different curvatures with a fixed number of parameters, while polygon-points-based methods need to utilize a different number of points. 2) It can distinguish adjacent or overlapping texts as they have apparent different curve coefficients, while segmentation-based or points-based methods suffer from adhesive spatial positions. PBFormer combines the PB with the transformer, which can directly generate smooth text contours sampled from predicted curves without interpolation. A parameter-free cross-scale pixel attention (CPA) module is employed to highlight the feature map of a suitable scale while suppressing the other feature maps. The simple operation can help detect small-scale texts and is compatible with the one-stage DETR framework, where no postprocessing exists for NMS. Furthermore, PBFormer is trained with a shape-contained loss, which not only enforces the piecewise alignment between the ground truth and the predicted curves but also makes curves' positions and shapes consistent with each other. Without bells and whistles about text pre-training, our method is superior to the previous state-of-the-art text detectors on the arbitrary-shaped text datasets.


page 2

page 4

page 7

page 8


Efficient and Accurate Scene Text Detection with Low-Rank Approximation Network

Recently, regression-based methods, which predict parameter curves for l...

Which and Where to Focus: A Simple yet Accurate Framework for Arbitrary-Shaped Nearby Text Detection in Scene Images

Scene text detection has drawn the close attention of researchers. Thoug...

RayNet: Real-time Scene Arbitrary-shape Text Detection with Multiple Rays

Existing object detection-based text detectors mainly concentrate on det...

Attention-based Feature Decomposition-Reconstruction Network for Scene Text Detection

Recently, scene text detection has been a challenging task. Texts with a...

Arbitrary Shape Text Detection using Transformers

Recent text detection frameworks require several handcrafted components ...

The Vector Space of Convex Curves: How to Mix Shapes

We present a novel, log-radius profile representation for convex curves ...

End-to-End Vectorized HD-map Construction with Piecewise Bezier Curve

Vectorized high-definition map (HD-map) construction, which focuses on t...

Please sign up or login with your details

Forgot password? Click here to reset