An End-to-End Approach to Natural Language Object Retrieval via Context-Aware Deep Reinforcement Learning

03/22/2017
by   Fan Wu, et al.
0

We propose an end-to-end approach to the natural language object retrieval task, which localizes an object within an image according to a natural language description, i.e., referring expression. Previous works divide this problem into two independent stages: first, compute region proposals from the image without the exploration of the language description; second, score the object proposals with regard to the referring expression and choose the top-ranked proposals. The object proposals are generated independently from the referring expression, which makes the proposal generation redundant and even irrelevant to the referred object. In this work, we train an agent with deep reinforcement learning, which learns to move and reshape a bounding box to localize the object according to the referring expression. We incorporate both the spatial and temporal context information into the training procedure. By simultaneously exploiting local visual information, the spatial and temporal context and the referring language a priori, the agent selects an appropriate action to take at each time. A special action is defined to indicate when the agent finds the referred object, and terminate the procedure. We evaluate our model on various datasets, and our algorithm significantly outperforms the compared algorithms. Notably, the accuracy improvement of our method over the recent method GroundeR and SCRC on the ReferItGame dataset are 7.67

READ FULL TEXT

page 5

page 7

research
09/02/2018

Natural Language Person Search Using Deep Reinforcement Learning

Recent success in deep reinforcement learning is having an agent learn h...
research
05/24/2017

Attention-based Natural Language Person Retrieval

Following the recent progress in image classification and captioning usi...
research
03/09/2021

Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement Learning

In this paper, we are tackling the proposal-free referring expression gr...
research
11/13/2015

Natural Language Object Retrieval

In this paper, we address the task of natural language object retrieval,...
research
02/13/2013

Object Recognition with Imperfect Perception and Redundant Description

This paper deals with a scene recognition system in a robotics contex. T...
research
10/24/2018

Resolving Referring Expressions in Images With Labeled Elements

Images may have elements containing text and a bounding box associated w...
research
04/10/2018

Outline Objects using Deep Reinforcement Learning

Image segmentation needs both local boundary position information and gl...

Please sign up or login with your details

Forgot password? Click here to reset