Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection

10/18/2022
by   Xin Li, et al.
0

Multi-modal 3D object detection has been an active research topic in autonomous driving. Nevertheless, it is non-trivial to explore the cross-modal feature fusion between sparse 3D points and dense 2D pixels. Recent approaches either fuse the image features with the point cloud features that are projected onto the 2D image plane or combine the sparse point cloud with dense image pixels. These fusion approaches often suffer from severe information loss, thus causing sub-optimal performance. To address these problems, we construct the homogeneous structure between the point cloud and images to avoid projective information loss by transforming the camera features into the LiDAR 3D space. In this paper, we propose a homogeneous multi-modal feature fusion and interaction method (HMFI) for 3D object detection. Specifically, we first design an image voxel lifter module (IVLM) to lift 2D image features into the 3D space and generate homogeneous image voxel features. Then, we fuse the voxelized point cloud features with the image features from different regions by introducing the self-attention based query fusion mechanism (QFM). Next, we propose a voxel feature interaction module (VFIM) to enforce the consistency of semantic information from identical objects in the homogeneous point cloud and image voxel representations, which can provide object-level alignment guidance for cross-modal feature fusion and strengthen the discriminative ability in complex backgrounds. We conduct extensive experiments on the KITTI and Waymo Open Dataset, and the proposed HMFI achieves better performance compared with the state-of-the-art multi-modal methods. Particularly, for the 3D detection of cyclist on the KITTI benchmark, HMFI surpasses all the published algorithms by a large margin.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/18/2023

PTA-Det: Point Transformer Associating Point cloud and Image for 3D Object Detection

In autonomous driving, 3D object detection based on multi-modal data has...
research
03/07/2023

LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion

LiDAR-camera fusion methods have shown impressive performance in 3D obje...
research
04/17/2023

SDVRF: Sparse-to-Dense Voxel Region Fusion for Multi-modal 3D Object Detection

In the perception task of autonomous driving, multi-modal methods have b...
research
07/18/2023

MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection

In this paper, we propose a novel and effective Multi-Level Fusion netwo...
research
11/01/2021

Structure Information is the Key: Self-Attention RoI Feature Extractor in 3D Object Detection

Unlike 2D object detection where all RoI features come from grid pixels,...
research
01/17/2022

AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection

Object detection through either RGB images or the LiDAR point clouds has...
research
09/02/2021

4D-Net for Learned Multi-Modal Alignment

We present 4D-Net, a 3D object detection approach, which utilizes 3D Poi...

Please sign up or login with your details

Forgot password? Click here to reset