End-to-end Semantic Object Detection with Cross-Modal Alignment

02/10/2023
by   Silvan Ferreira, et al.
0

Traditional semantic image search methods aim to retrieve images that match the meaning of the text query. However, these methods typically search for objects on the whole image, without considering the localization of objects within the image. This paper presents an extension of existing object detection models for semantic image search that considers the semantic alignment between object proposals and text queries, with a focus on searching for objects within images. The proposed model uses a single feature extractor, a pre-trained Convolutional Neural Network, and a transformer encoder to encode the text query. Proposal-text alignment is performed using contrastive learning, producing a score for each proposal that reflects its semantic alignment with the text query. The Region Proposal Network (RPN) is used to generate object proposals, and the end-to-end training process allows for an efficient and effective solution for semantic image search. The proposed model was trained end-to-end, providing a promising solution for semantic image search that retrieves images that match the meaning of the text query and generates semantically relevant object proposals.

READ FULL TEXT

page 4

page 9

page 12

research
09/27/2021

Text-based Person Search in Full Images via Semantic-Driven Proposal Generation

Finding target persons in full scene images with a query of text descrip...
research
06/20/2019

Pattern Spotting in Historical Documents Using Convolutional Models

Pattern spotting consists of searching in a collection of historical doc...
research
10/28/2019

Fine-Grained Object Detection over Scientific Document Images with Region Embeddings

We study the problem of object detection over scanned images of scientif...
research
03/07/2018

Object cosegmentation using deep Siamese network

Object cosegmentation addresses the problem of discovering similar objec...
research
04/08/2021

Prototypical Region Proposal Networks for Few-Shot Localization and Classification

Recently proposed few-shot image classification methods have generally f...
research
10/08/2019

TraffickCam: Explainable Image Matching For Sex Trafficking Investigations

Investigations of sex trafficking sometimes have access to photographs o...
research
05/12/2021

VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching

The prevailing framework for matching multimodal inputs is based on a tw...

Please sign up or login with your details

Forgot password? Click here to reset