Open-Vocabulary Point-Cloud Object Detection without 3D Annotation

by   Yuheng Lu, et al.

The goal of open-vocabulary detection is to identify novel objects based on arbitrary textual descriptions. In this paper, we address open-vocabulary 3D point-cloud detection by a dividing-and-conquering strategy, which involves: 1) developing a point-cloud detector that can learn a general representation for localizing various objects, and 2) connecting textual and point-cloud representations to enable the detector to classify novel object categories based on text prompting. Specifically, we resort to rich image pre-trained models, by which the point-cloud detector learns localizing objects under the supervision of predicted 2D bounding boxes from 2D pre-trained detectors. Moreover, we propose a novel de-biased triplet cross-modal contrastive learning to connect the modalities of image, point-cloud and text, thereby enabling the point-cloud detector to benefit from vision-language pre-trained models,i.e.,CLIP. The novel use of image and vision-language pre-trained models for point-cloud detectors allows for open-vocabulary 3D object detection without the need for 3D annotations. Experiments demonstrate that the proposed method improves at least 3.03 points and 7.47 points over a wide range of baselines on the ScanNet and SUN RGB-D datasets, respectively. Furthermore, we provide a comprehensive analysis to explain why our approach works.


page 1

page 8


Open-Vocabulary 3D Detection via Image-level Class and Debiased Cross-modal Contrastive Learning

Current point-cloud detection methods have difficulty detecting the open...

3D Point Cloud Pre-training with Knowledge Distillation from 2D Images

The recent success of pre-trained 2D vision models is mostly attributabl...

Detect Only What You Specify : Object Detection with Linguistic Target

Object detection is a computer vision task of predicting a set of boundi...

Going Denser with Open-Vocabulary Part Segmentation

Object detection has been expanded from a limited number of categories t...

Offboard 3D Object Detection from Point Cloud Sequences

While current 3D object recognition research mostly focuses on the real-...

Label Name is Mantra: Unifying Point Cloud Segmentation across Heterogeneous Datasets

Point cloud segmentation is a fundamental task in 3D vision that serves ...

Inferring 3D Articulated Models for Box Packaging Robot

Given a point cloud, we consider inferring kinematic models of 3D articu...

Please sign up or login with your details

Forgot password? Click here to reset