Learning Auxiliary Monocular Contexts Helps Monocular 3D Object Detection

by   Xianpeng Liu, et al.

Monocular 3D object detection aims to localize 3D bounding boxes in an input single 2D image. It is a highly challenging problem and remains open, especially when no extra information (e.g., depth, lidar and/or multi-frames) can be leveraged in training and/or inference. This paper proposes a simple yet effective formulation for monocular 3D object detection without exploiting any extra information. It presents the MonoCon method which learns Monocular Contexts, as auxiliary tasks in training, to help monocular 3D object detection. The key idea is that with the annotated 3D bounding boxes of objects in an image, there is a rich set of well-posed projected 2D supervision signals available in training, such as the projected corner keypoints and their associated offset vectors with respect to the center of 2D bounding box, which should be exploited as auxiliary tasks in training. The proposed MonoCon is motivated by the Cramer-Wold theorem in measure theory at a high level. In implementation, it utilizes a very simple end-to-end design to justify the effectiveness of learning auxiliary monocular contexts, which consists of three components: a Deep Neural Network (DNN) based feature backbone, a number of regression head branches for learning the essential parameters used in the 3D bounding box prediction, and a number of regression head branches for learning auxiliary contexts. After training, the auxiliary context regression branches are discarded for better inference efficiency. In experiments, the proposed MonoCon is tested in the KITTI benchmark (car, pedestrain and cyclist). It outperforms all prior arts in the leaderboard on car category and obtains comparable performance on pedestrian and cyclist in terms of accuracy. Thanks to the simple design, the proposed MonoCon method obtains the fastest inference speed with 38.7 fps in comparisons


page 2

page 12

page 13

page 14


Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation

Monocular 3D object detection task aims to predict the 3D bounding boxes...

Object Viewpoint Classification Based 3D Bounding Box Estimation for Autonomous Vehicles

3D object detection is one of the most important tasks for the perceptio...

OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object Detection

Despite monocular 3D object detection having recently made a significant...

Object-Aware Centroid Voting for Monocular 3D Object Detection

Monocular 3D object detection aims to detect objects in a 3D physical wo...

Reinforced Axial Refinement Network for Monocular 3D Object Detection

Monocular 3D object detection aims to extract the 3D position and proper...

Monocular 3D Object Detection with Bounding Box Denoising in 3D by Perceiver

The main challenge of monocular 3D object detection is the accurate loca...

Polygon Intersection-over-Union Loss for Viewpoint-Agnostic Monocular 3D Vehicle Detection

Monocular 3D object detection is a challenging task because depth inform...

Please sign up or login with your details

Forgot password? Click here to reset