Multi-Modal 3D Object Detection by Box Matching

by   Zhe Liu, et al.

Multi-modal 3D object detection has received growing attention as the information from different sensors like LiDAR and cameras are complementary. Most fusion methods for 3D detection rely on an accurate alignment and calibration between 3D point clouds and RGB images. However, such an assumption is not reliable in a real-world self-driving system, as the alignment between different modalities is easily affected by asynchronous sensors and disturbed sensor placement. We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection, which provides an alternative way for cross-modal feature alignment by learning the correspondence at the bounding box level to free up the dependency of calibration during inference. With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features. Extensive experiments on the nuScenes dataset demonstrate that our method is much more stable in dealing with challenging cases such as asynchronous sensors, misaligned sensor placement, and degenerated camera images than existing fusion methods. We hope that our FBMNet could provide an available solution to dealing with these challenging cases for safety in real autonomous driving scenarios. Codes will be publicly available at


page 13

page 14

page 15


UniM^2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving

Masked Autoencoders (MAE) play a pivotal role in learning potent represe...

BPFNet: A Unified Framework for Bimodal Palmprint Alignment and Fusion

Bimodal palmprint recognition leverages palmprint and palm vein images s...

Informative Data Selection with Uncertainty for Multi-modal Object Detection

Noise has always been nonnegligible trouble in object detection by creat...

RGB-Event Fusion for Moving Object Detection in Autonomous Driving

Moving Object Detection (MOD) is a critical vision task for successfully...

AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection

Object detection through either RGB images or the LiDAR point clouds has...

Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer

Large-scale deployment of autonomous vehicles has been continually delay...

A sensor-to-pattern calibration framework for multi-modal industrial collaborative cells

Collaborative robotic industrial cells are workspaces where robots colla...

Please sign up or login with your details

Forgot password? Click here to reset