Simple, Effective and General: A New Backbone for Cross-view Image Geo-localization

by   Yingying Zhu, et al.

In this work, we aim at an important but less explored problem of a simple yet effective backbone specific for cross-view geo-localization task. Existing methods for cross-view geo-localization tasks are frequently characterized by 1) complicated methodologies, 2) GPU-consuming computations, and 3) a stringent assumption that aerial and ground images are centrally or orientation aligned. To address the above three challenges for cross-view image matching, we propose a new backbone network, named Simple Attention-based Image Geo-localization network (SAIG). The proposed SAIG effectively represents long-range interactions among patches as well as cross-view correspondence with multi-head self-attention layers. The "narrow-deep" architecture of our SAIG improves the feature richness without degradation in performance, while its shallow and effective convolutional stem preserves the locality, eliminating the loss of patchify boundary information. Our SAIG achieves state-of-the-art results on cross-view geo-localization, while being far simpler than previous works. Furthermore, with only 15.9 dimension compared to the state-of-the-art, the SAIG adapts well across multiple cross-view datasets without employing any well-designed feature aggregation modules or feature alignment algorithms. In addition, our SAIG attains competitive scores on image retrieval benchmarks, further demonstrating its generalizability. As a backbone network, our SAIG is both easy to follow and computationally lightweight, which is meaningful in practical scenario. Moreover, we propose a simple Spatial-Mixed feature aggregation moDule (SMD) that can mix and project spatial information into a low-dimensional space to generate feature descriptors... (The code is available at


page 17

page 18

page 20


Cross-view Geo-localization with Evolving Transformer

In this work, we address the problem of cross-view geo-localization, whi...

GeoCapsNet: Aerial to Ground view Image Geo-localization using Capsule Network

The task of cross-view image geo-localization aims to determine the geo-...

Lending Orientation to Neural Networks for Cross-view Geo-localization

This paper studies image-based geo-localization (IBL) problem using grou...

Hierarchical Attention Fusion for Geo-Localization

Geo-localization is a critical task in computer vision. In this work, we...

Cross-View Image Sequence Geo-localization

Cross-view geo-localization aims to estimate the GPS location of a query...

Localizing Visual Sounds the Easy Way

Unsupervised audio-visual source localization aims at localizing visible...

Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization

Cross-view geo-localization is to spot images of the same geographic tar...

Please sign up or login with your details

Forgot password? Click here to reset