Pyramid Transformer for Traffic Sign Detection

07/13/2022
by   Omid Nejati Manzari, et al.
0

Traffic sign detection is a vital task in the visual system of self-driving cars and the automated driving system. Recently, novel Transformer-based models have achieved encouraging results for various computer vision tasks. We still observed that vanilla ViT could not yield satisfactory results in traffic sign detection because the overall size of the datasets is very small and the class distribution of traffic signs is extremely unbalanced. To overcome this problem, a novel Pyramid Transformer with locality mechanisms is proposed in this paper. Specifically, Pyramid Transformer has several spatial pyramid reduction layers to shrink and embed the input image into tokens with rich multi-scale context by using atrous convolutions. Moreover, it inherits an intrinsic scale invariance inductive bias and is able to learn local feature representation for objects at various scales, thereby enhancing the network robustness against the size discrepancy of traffic signs. The experiments are conducted on the German Traffic Sign Detection Benchmark (GTSDB). The results demonstrate the superiority of the proposed model in the traffic sign detection tasks. More specifically, Pyramid Transformer achieves 75.6 applied to the Cascade RCNN as the backbone and surpassing most well-known and widely used SOTAs.

READ FULL TEXT
research
06/07/2021

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

Transformers have shown great potential in various computer vision tasks...
research
12/16/2021

Improved YOLOv5 network for real-time multi-scale traffic sign detection

Traffic sign detection is a challenging task for the unmanned driving sy...
research
07/29/2021

PPT Fusion: Pyramid Patch Transformerfor a Case Study in Image Fusion

The Transformer architecture has achieved rapiddevelopment in recent yea...
research
01/27/2023

Robust Transformer with Locality Inductive Bias and Feature Normalization

Vision transformers have been demonstrated to yield state-of-the-art res...
research
02/12/2022

Multi-direction and Multi-scale Pyramid in Transformer for Video-based Pedestrian Retrieval

In video surveillance, pedestrian retrieval (also called person re-ident...
research
06/22/2021

P2T: Pyramid Pooling Transformer for Scene Understanding

This paper jointly resolves two problems in vision transformer: i) the c...
research
11/10/2015

Traffic Sign Classification Using Deep Inception Based Convolutional Networks

In this work, we propose a novel deep network for traffic sign classific...

Please sign up or login with your details

Forgot password? Click here to reset