Learning a Recurrent Visual Representation for Image Caption Generation

11/20/2014
by   Xinlei Chen, et al.
0

In this paper we explore the bi-directional mapping between images and their sentence-based descriptions. We propose learning this mapping using a recurrent neural network. Unlike previous approaches that map both sentences and images to a common embedding, we enable the generation of novel sentences given an image. Using the same model, we can also reconstruct the visual features associated with an image given its visual description. We use a novel recurrent visual memory that automatically learns to remember long-term visual concepts to aid in both sentence generation and visual feature reconstruction. We evaluate our approach on several tasks. These include sentence generation, sentence retrieval and image retrieval. State-of-the-art results are shown for the task of generating novel image descriptions. When compared to human generated captions, our automatically generated captions are preferred by humans over 19.8% of the time. Results are better than or comparable to state-of-the-art results on the image and sentence retrieval tasks for methods using similar visual features.

READ FULL TEXT

page 3

page 7

page 8

research
12/07/2014

Deep Visual-Semantic Alignments for Generating Image Descriptions

We present a model that generates natural language descriptions of image...
research
08/08/2016

Learning Joint Representations of Videos and Sentences with Web Image Search

Our objective is video retrieval based on natural language queries. In a...
research
08/31/2018

Learning to Describe Differences Between Pairs of Similar Images

In this paper, we introduce the task of automatically generating text to...
research
09/05/2017

Predicting Visual Features from Text for Image and Video Caption Retrieval

This paper strives to find amidst a set of sentences the one best descri...
research
02/28/2015

Generating Multi-Sentence Lingual Descriptions of Indoor Scenes

This paper proposes a novel framework for generating lingual description...
research
03/15/2021

Knowledge driven Description Synthesis for Floor Plan Interpretation

Image captioning is a widely known problem in the area of AI. Caption ge...
research
11/09/2019

On Architectures for Including Visual Information in Neural Language Models for Image Description

A neural language model can be conditioned into generating descriptions ...

Please sign up or login with your details

Forgot password? Click here to reset