Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training

02/27/2023
by   Ziyu Guo, et al.
0

Masked Autoencoders (MAE) have shown promising performance in self-supervised learning for both 2D and 3D computer vision. However, existing MAE-style methods can only learn from the data of a single modality, i.e., either images or point clouds, which neglect the implicit semantic and geometric correlation between 2D and 3D. In this paper, we explore how the 2D modality can benefit 3D masked autoencoding, and propose Joint-MAE, a 2D-3D joint MAE framework for self-supervised 3D point cloud pre-training. Joint-MAE randomly masks an input 3D point cloud and its projected 2D images, and then reconstructs the masked information of the two modalities. For better cross-modal interaction, we construct our JointMAE by two hierarchical 2D-3D embedding modules, a joint encoder, and a joint decoder with modal-shared and model-specific decoders. On top of this, we further introduce two cross-modal strategies to boost the 3D representation learning, which are local-aligned attention mechanisms for 2D-3D semantic cues, and a cross-reconstruction loss for 2D-3D geometric constraints. By our pre-training paradigm, Joint-MAE achieves superior performance on multiple downstream tasks, e.g., 92.4 and 86.07

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2022

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding

Manual annotation of large-scale point cloud dataset for varying tasks s...
research
03/14/2023

PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection

Masked Autoencoders learn strong visual representations and achieve stat...
research
12/29/2022

Self-Supervised Pre-training for 3D Point Clouds via View-Specific Point-to-Image Translation

The past few years have witnessed the prevalence of self-supervised repr...
research
10/09/2022

Let Images Give You More:Point Cloud Cross-Modal Training for Shape Analysis

Although recent point cloud analysis achieves impressive progress, the p...
research
10/03/2022

CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-training

Pre-training across 3D vision and language remains under development bec...
research
05/28/2020

Self-supervised Modal and View Invariant Feature Learning

Most of the existing self-supervised feature learning methods for 3D dat...
research
11/13/2022

Point-DAE: Denoising Autoencoders for Self-supervised Point Cloud Learning

Masked autoencoder has demonstrated its effectiveness in self-supervised...

Please sign up or login with your details

Forgot password? Click here to reset