Model-Agnostic Hierarchical Attention for 3D Object Detection

01/06/2023
by   Manli Shu, et al.
0

Transformers as versatile network architectures have recently seen great success in 3D point cloud object detection. However, the lack of hierarchy in a plain transformer makes it difficult to learn features at different scales and restrains its ability to extract localized features. Such limitation makes them have imbalanced performance on objects of different sizes, with inferior performance on smaller ones. In this work, we propose two novel attention mechanisms as modularized hierarchical designs for transformer-based 3D detectors. To enable feature learning at different scales, we propose Simple Multi-Scale Attention that builds multi-scale tokens from a single-scale input feature. For localized feature aggregation, we propose Size-Adaptive Local Attention with adaptive attention ranges for every bounding box proposal. Both of our attention modules are model-agnostic network layers that can be plugged into existing point cloud transformers for end-to-end training. We evaluate our method on two widely used indoor 3D point cloud object detection benchmarks. By plugging our proposed modules into the state-of-the-art transformer-based 3D detector, we improve the previous best results on both benchmarks, with the largest improvement margin on small objects.

READ FULL TEXT
research
04/24/2021

M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers

We present a novel architecture for 3D object detection, M3DeTR, which c...
research
03/02/2022

3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification

Although accurate and fast point cloud classification is a fundamental t...
research
10/30/2021

PatchFormer: A Versatile 3D Transformer Based on Patch Attention

The 3D vision community is witnesses a modeling shift from CNNs to Trans...
research
04/12/2023

Multi-scale Geometry-aware Transformer for 3D Point Cloud Classification

Self-attention modules have demonstrated remarkable capabilities in capt...
research
03/28/2022

MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection

Human-Object Interaction (HOI) detection is the task of identifying a se...
research
03/02/2022

Improving Point Cloud Based Place Recognition with Ranking-based Loss and Large Batch Training

The paper presents a simple and effective learning-based method for comp...
research
05/05/2021

Learning Feature Aggregation for Deep 3D Morphable Models

3D morphable models are widely used for the shape representation of an o...

Please sign up or login with your details

Forgot password? Click here to reset