HiT: Building Mapping with Hierarchical Transformers

by   Mingming Zhang, et al.

Deep learning-based methods have been extensively explored for automatic building mapping from high-resolution remote sensing images over recent years. While most building mapping models produce vector polygons of buildings for geographic and mapping systems, dominant methods typically decompose polygonal building extraction in some sub-problems, including segmentation, polygonization, and regularization, leading to complex inference procedures, low accuracy, and poor generalization. In this paper, we propose a simple and novel building mapping method with Hierarchical Transformers, called HiT, improving polygonal building mapping quality from high-resolution remote sensing images. HiT builds on a two-stage detection architecture by adding a polygon head parallel to classification and bounding box regression heads. HiT simultaneously outputs building bounding boxes and vector polygons, which is fully end-to-end trainable. The polygon head formulates a building polygon as serialized vertices with the bidirectional characteristic, a simple and elegant polygon representation avoiding the start or end vertex hypothesis. Under this new perspective, the polygon head adopts a transformer encoder-decoder architecture to predict serialized vertices supervised by the designed bidirectional polygon loss. Furthermore, a hierarchical attention mechanism combined with convolution operation is introduced in the encoder of the polygon head, providing more geometric structures of building polygons at vertex and edge levels. Comprehensive experiments on two benchmarks (the CrowdAI and Inria datasets) demonstrate that our method achieves a new state-of-the-art in terms of instance segmentation and polygonal metrics compared with state-of-the-art methods. Moreover, qualitative results verify the superiority and effectiveness of our model under complex scenes.


page 1

page 4

page 7

page 9

page 10


PolyBuilding: Polygon Transformer for End-to-End Building Extraction

We present PolyBuilding, a fully end-to-end polygon Transformer for buil...

BiSVP: Building Footprint Extraction via Bidirectional Serialized Vertex Prediction

Extracting building footprints from remote sensing images has been attra...

Instance segmentation of buildings using keypoints

Building segmentation is of great importance in the task of remote sensi...

BuildFormer: Automatic building extraction with vision transformer

Building extraction from fine-resolution remote sensing images plays a v...

Accurate Building Detection in VHR Remote Sensing Images using Geometric Saliency

This paper aims to address the problem of detecting buildings from remot...

SepHRNet: Generating High-Resolution Crop Maps from Remote Sensing imagery using HRNet with Separable Convolution

The accurate mapping of crop production is crucial for ensuring food sec...

Expediting Building Footprint Segmentation from High-resolution Remote Sensing Images via progressive lenient supervision

The efficacy of building footprint segmentation from remotely sensed ima...

Please sign up or login with your details

Forgot password? Click here to reset