MAELi – Masked Autoencoder for Large-Scale LiDAR Point Clouds

by   Georg Krispel, et al.

We show how the inherent, but often neglected, properties of large-scale LiDAR point clouds can be exploited for effective self-supervised representation learning. To this end, we design a highly data-efficient feature pre-training backbone that significantly reduces the amount of tedious 3D annotations to train state-of-the-art object detectors. In particular, we propose a Masked AutoEncoder (MAELi) that intuitively utilizes the sparsity of the LiDAR point clouds in both, the encoder and the decoder, during reconstruction. This results in more expressive and useful features, directly applicable to downstream perception tasks, such as 3D object detection for autonomous driving. In a novel reconstruction scheme, MAELi distinguishes between free and occluded space and leverages a new masking strategy which targets the LiDAR's inherent spherical projection. To demonstrate the potential of MAELi, we pre-train one of the most widespread 3D backbones, in an end-to-end fashion and show the merit of our fully unsupervised pre-trained features on several 3D object detection architectures. Given only a tiny fraction of labeled frames to fine-tune such detectors, we achieve significant performance improvements. For example, with only ∼800 labeled frames, MAELi features improve a SECOND model by +10.09APH/LEVEL 2 on Waymo Vehicles.


page 1

page 4

page 5

page 13


BEV-MAE: Bird's Eye View Masked Autoencoders for Outdoor Point Cloud Pre-training

Current outdoor LiDAR-based 3D object detection methods mainly adopt the...

Voxel-MAE: Masked Autoencoders for Pre-training Large-scale Point Clouds

Mask-based pre-training has achieved great success for self-supervised l...

SSC3OD: Sparsely Supervised Collaborative 3D Object Detection from LiDAR Point Clouds

Collaborative 3D object detection, with its improved interaction advanta...

Hindsight is 20/20: Leveraging Past Traversals to Aid 3D Perception

Self-driving cars must detect vehicles, pedestrians, and other traffic p...

GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds

Despite the tremendous progress of Masked Autoencoders (MAE) in developi...

CALICO: Self-Supervised Camera-LiDAR Contrastive Pre-training for BEV Perception

Perception is crucial in the realm of autonomous driving systems, where ...

Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection

Voxel-based methods have achieved state-of-the-art performance for 3D ob...

Please sign up or login with your details

Forgot password? Click here to reset