CMIR-NET : A Deep Learning Based Model For Cross-Modal Retrieval In Remote Sensing

by   Ushasi Chaudhuri, et al.

We address the problem of cross-modal information retrieval in the domain of remote sensing. In particular, we are interested in two application scenarios: i) cross-modal retrieval between panchromatic (PAN) and multi-spectral imagery, and ii) multi-label image retrieval between very high resolution (VHR) images and speech based label annotations. Notice that these multi-modal retrieval scenarios are more challenging than the traditional uni-modal retrieval approaches given the inherent differences in distributions between the modalities. However, with the growing availability of multi-source remote sensing data and the scarcity of enough semantic annotations, the task of multi-modal retrieval has recently become extremely important. In this regard, we propose a novel deep neural network based architecture which is considered to learn a discriminative shared feature space for all the input modalities, suitable for semantically coherent information retrieval. Extensive experiments are carried out on the benchmark large-scale PAN - multi-spectral DSRSID dataset and the multi-label UC-Merced dataset. Together with the Merced dataset, we generate a corpus of speech signals corresponding to the labels. Superior performance with respect to the current state-of-the-art is observed in all the cases.


page 1

page 4

page 8


A Novel Self-Supervised Cross-Modal Image Retrieval Method In Remote Sensing

Due to the availability of multi-modal remote sensing (RS) image archive...

X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for Classification of Remote Sensing Data

This paper addresses the problem of semi-supervised transfer learning wi...

Deep Unsupervised Contrastive Hashing for Large-Scale Cross-Modal Text-Image Retrieval in Remote Sensing

Due to the availability of large-scale multi-modal data (e.g., satellite...

Scale-Semantic Joint Decoupling Network for Image-text Retrieval in Remote Sensing

Image-text retrieval in remote sensing aims to provide flexible informat...

Two Headed Dragons: Multimodal Fusion and Cross Modal Transactions

As the field of remote sensing is evolving, we witness the accumulation ...

Learning to Evaluate Performance of Multi-modal Semantic Localization

Semantic localization (SeLo) refers to the task of obtaining the most re...

Juggling With Representations: On the Information Transfer Between Imagery, Point Clouds, and Meshes for Multi-Modal Semantics

The automatic semantic segmentation of the huge amount of acquired remot...

Please sign up or login with your details

Forgot password? Click here to reset