High-Fidelity Visual Structural Inspections through Transformers and Learnable Resizers

by   Kareem Eltouny, et al.

Visual inspection is the predominant technique for evaluating the condition of civil infrastructure. The recent advances in unmanned aerial vehicles (UAVs) and artificial intelligence have made the visual inspections faster, safer, and more reliable. Camera-equipped UAVs are becoming the new standard in the industry by collecting massive amounts of visual data for human inspectors. Meanwhile, there has been significant research on autonomous visual inspections using deep learning algorithms, including semantic segmentation. While UAVs can capture high-resolution images of buildings' façades, high-resolution segmentation is extremely challenging due to the high computational memory demands. Typically, images are uniformly downsized at the price of losing fine local details. Contrarily, breaking the images into multiple smaller patches can cause a loss of global contextual in-formation. We propose a hybrid strategy that can adapt to different inspections tasks by managing the global and local semantics trade-off. The framework comprises a compound, high-resolution deep learning architecture equipped with an attention-based segmentation model and learnable downsampler-upsampler modules designed for optimal efficiency and in-formation retention. The framework also utilizes vision transformers on a grid of image crops aiming for high precision learning without downsizing. An augmented inference technique is used to boost the performance and re-duce the possible loss of context due to grid cropping. Comprehensive experiments have been performed on 3D physics-based graphics models synthetic environments in the Quake City dataset. The proposed framework is evaluated using several metrics on three segmentation tasks: component type, component damage state, and global damage (crack, rebar, spalling).


page 3

page 4

page 7


High-Resolution Vision Transformers for Pixel-Level Identification of Structural Components and Damage

Visual inspection is predominantly used to evaluate the state of civil s...

A Convolutional Cost-Sensitive Crack Localization Algorithm for Automated and Reliable RC Bridge Inspection

Bridges are an essential part of the transportation infrastructure and n...

Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation

Transformers have proved to be very effective for visual recognition tas...

SeaDroneSim: Simulation of Aerial Images for Detection of Objects Above Water

Unmanned Aerial Vehicles (UAVs) are known for their fast and versatile a...

Linear features segmentation from aerial images

The rapid development of remote sensing technologies have gained signifi...

Deployment of Aerial Robots during the Flood Disaster in Erftstadt / Blessem in July 2021

Climate change is leading to more and more extreme weather events such a...

Representation Separation for Semantic Segmentation with Vision Transformers

Vision transformers (ViTs) encoding an image as a sequence of patches br...

Please sign up or login with your details

Forgot password? Click here to reset