GLAC Net: GLocal Attention Cascading Networks for Multi-image Cued Story Generation

05/28/2018
by   Taehyeong Kim, et al.
0

The task of multi-image cued story generation, such as visual storytelling dataset (VIST) challenge, is to compose multiple coherent sentences from a given sequence of images. The main difficulty is how to generate image-specific sentences within the context of overall images. Here we propose a deep learning network model, GLAC Net, that generates visual stories by combining global-local (glocal) attention and context cascading mechanisms. The model incorporates two levels of attention, i.e., overall encoding level and image feature level, to construct image-dependent sentences. While standard attention configuration needs a large number of parameters, the GLAC Net implements them in a very simple way via hard connections from the outputs of encoders or image features onto the sentence generators. The coherency of the generated story is further improved by conveying (cascading) the information of the previous sentence to the next sentence serially. We evaluate the performance of the GLAC Net on the visual storytelling dataset (VIST) and achieve very competitive results compared to the state-of-the-art techniques.

READ FULL TEXT
research
08/04/2021

Towards Coherent Visual Storytelling with Ordered Image Attention

We address the problem of visual storytelling, i.e., generating a story ...
research
08/03/2022

Word-Level Fine-Grained Story Visualization

Story visualization aims to generate a sequence of images to narrate eac...
research
05/21/2018

Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation

We propose a hierarchically structured reinforcement learning approach t...
research
06/03/2018

Contextualize, Show and Tell: A Neural Visual Storyteller

We present a neural model for generating short stories from image sequen...
research
05/30/2018

Using Inter-Sentence Diverse Beam Search to Reduce Redundancy in Visual Storytelling

Visual storytelling includes two important parts: coherence between the ...
research
09/18/2023

Causal-Story: Local Causal Attention Utilizing Parameter-Efficient Tuning For Visual Story Synthesis

The excellent text-to-image synthesis capability of diffusion models has...
research
07/12/2020

Sparse Graph to Sequence Learning for Vision Conditioned Long Textual Sequence Generation

Generating longer textual sequences when conditioned on the visual infor...

Please sign up or login with your details

Forgot password? Click here to reset