BEV-DG: Cross-Modal Learning under Bird's-Eye View for Domain Generalization of 3D Semantic Segmentation

08/12/2023
by   Miaoyu Li, et al.
0

Cross-modal Unsupervised Domain Adaptation (UDA) aims to exploit the complementarity of 2D-3D data to overcome the lack of annotation in a new domain. However, UDA methods rely on access to the target domain during training, meaning the trained model only works in a specific target domain. In light of this, we propose cross-modal learning under bird's-eye view for Domain Generalization (DG) of 3D semantic segmentation, called BEV-DG. DG is more challenging because the model cannot access the target domain during training, meaning it needs to rely on cross-modal learning to alleviate the domain gap. Since 3D semantic segmentation requires the classification of each point, existing cross-modal learning is directly conducted point-to-point, which is sensitive to the misalignment in projections between pixels and points. To this end, our approach aims to optimize domain-irrelevant representation modeling with the aid of cross-modal learning under bird's-eye view. We propose BEV-based Area-to-area Fusion (BAF) to conduct cross-modal learning under bird's-eye view, which has a higher fault tolerance for point-level misalignment. Furthermore, to model domain-irrelevant representations, we propose BEV-driven Domain Contrastive Learning (BDCL) with the help of cross-modal learning under bird's-eye view. We design three domain generalization settings based on three 3D datasets, and BEV-DG significantly outperforms state-of-the-art competitors with tremendous margins in all settings.

READ FULL TEXT

page 3

page 5

page 7

page 9

research
07/09/2023

Mx2M: Masked Cross-Modality Modeling in Domain Adaptation for 3D Semantic Segmentation

Existing methods of cross-modal domain adaptation for 3D semantic segmen...
research
08/05/2023

Cross-modal Cross-domain Learning for Unsupervised LiDAR Semantic Segmentation

In recent years, cross-modal domain adaptation has been studied on the p...
research
10/13/2022

X-Align: Cross-Modal Cross-View Alignment for Bird's-Eye-View Segmentation

Bird's-eye-view (BEV) grid is a common representation for the perception...
research
07/30/2021

Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation

Domain adaptation is critical for success when confronting with the lack...
research
11/07/2018

Y^2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences

A recent method employs 3D voxels to represent 3D shapes, but this limit...
research
11/28/2019

xMUDA: Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation

Unsupervised Domain Adaptation (UDA) is crucial to tackle the lack of an...
research
03/21/2022

Drive Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross-modal Distillation

This work investigates learning pixel-wise semantic image segmentation i...

Please sign up or login with your details

Forgot password? Click here to reset