Monocular 3D Object Detection with Bounding Box Denoising in 3D by Perceiver

by   Xianpeng Liu, et al.

The main challenge of monocular 3D object detection is the accurate localization of 3D center. Motivated by a new and strong observation that this challenge can be remedied by a 3D-space local-grid search scheme in an ideal case, we propose a stage-wise approach, which combines the information flow from 2D-to-3D (3D bounding box proposal generation with a single 2D image) and 3D-to-2D (proposal verification by denoising with 3D-to-2D contexts) in a top-down manner. Specifically, we first obtain initial proposals from off-the-shelf backbone monocular 3D detectors. Then, we generate a 3D anchor space by local-grid sampling from the initial proposals. Finally, we perform 3D bounding box denoising at the 3D-to-2D proposal verification stage. To effectively learn discriminative features for denoising highly overlapped proposals, this paper presents a method of using the Perceiver I/O model to fuse the 3D-to-2D geometric information and the 2D appearance information. With the encoded latent representation of a proposal, the verification head is implemented with a self-attention module. Our method, named as MonoXiver, is generic and can be easily adapted to any backbone monocular 3D detectors. Experimental results on the well-established KITTI dataset and the challenging large-scale Waymo dataset show that MonoXiver consistently achieves improvement with limited computation overhead.


page 3

page 8

page 13

page 14


Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction

We present MonoPSR, a monocular 3D object detection method that leverage...

Multi-Grid Redundant Bounding Box Annotation for Accurate Object Detection

Modern leading object detectors are either two-stage or one-stage networ...

Boundary Distribution Estimation to Precise Object Detection

In principal modern detectors, the task of object localization is implem...

Object-Aware Centroid Voting for Monocular 3D Object Detection

Monocular 3D object detection aims to detect objects in a 3D physical wo...

Deep Fitting Degree Scoring Network for Monocular 3D Object Detection

In this paper, we propose to learn a deep fitting degree scoring network...

Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention

Referring Expression Comprehension (REC) has become one of the most impo...

Learning Auxiliary Monocular Contexts Helps Monocular 3D Object Detection

Monocular 3D object detection aims to localize 3D bounding boxes in an i...

Please sign up or login with your details

Forgot password? Click here to reset