Scalable Reverse Image Search Engine for NASAWorldview

by   Abhigya Sodani, et al.

Researchers often spend weeks sifting through decades of unlabeled satellite imagery(on NASA Worldview) in order to develop datasets on which they can start conducting research. We developed an interactive, scalable and fast image similarity search engine (which can take one or more images as the query image) that automatically sifts through the unlabeled dataset reducing dataset generation time from weeks to minutes. In this work, we describe key components of the end to end pipeline. Our similarity search system was created to be able to identify similar images from a potentially petabyte scale database that are similar to an input image, and for this we had to break down each query image into its features, which were generated by a classification layer stripped CNN trained in a supervised manner. To store and search these features efficiently, we had to make several scalability improvements. To improve the speed, reduce the storage, and shrink memory requirements for embedding search, we add a fully connected layer to our CNN make all images into a 128 length vector before entering the classification layers. This helped us compress the size of our image features from 2048 (for ResNet, which was initially tried as our featurizer) to 128 for our new custom model. Additionally, we utilize existing approximate nearest neighbor search libraries to significantly speed up embedding search. Our system currently searches over our entire database of images at 5 seconds per query on a single virtual machine in the cloud. In the future, we would like to incorporate a SimCLR based featurizing model which could be trained without any labelling by a human (since the classification aspect of the model is irrelevant to this use case).


page 3

page 4

page 5

page 6


Visual search over billions of aerial and satellite images

We present a system for performing visual search over billions of aerial...

A1: A Distributed In-Memory Graph Database

A1 is an in-memory distributed database used by the Bing search engine t...

Identical Image Retrieval using Deep Learning

In recent years, we know that the interaction with images has increased....

Fast Spectral Ranking for Similarity Search

Despite the success of deep learning on representing images for particul...

Curator: Creating Large-Scale Curated Labelled Datasets using Self-Supervised Learning

Applying Machine learning to domains like Earth Sciences is impeded by t...

Satellite Image Search in AgoraEO

The growing operational capability of global Earth Observation (EO) crea...

Fast and Scalable Image Search For Histology

The expanding adoption of digital pathology has enabled the curation of ...

Please sign up or login with your details

Forgot password? Click here to reset