Scene Graph to Image Generation with Contextualized Object Layout Refinement

by   Maor Ivgi, et al.

Generating high-quality images from scene graphs, that is, graphs that describe multiple entities in complex relations, is a challenging task that attracted substantial interest recently. Prior work trained such models by using supervised learning, where the goal is to produce the exact target image layout for each scene graph. It relied on predicting object locations and shapes independently and in parallel. However, scene graphs are underspecified, and thus the same scene graph often occurs with many target images in the training data. This leads to generated images with high inter-object overlap, empty areas, blurry objects, and overall compromised quality. In this work, we propose a method that alleviates these issues by generating all object layouts together and reducing the reliance on such supervision. Our model predicts layouts directly from embeddings (without predicting intermediate boxes) by gradually upsampling, refining and contextualizing object layouts. It is trained with a novel adversarial loss, that optimizes the interaction between object pairs. This improves coverage and removes overlaps, while maintaining sensible contours and respecting objects relations. We empirically show on the COCO-STUFF dataset that our proposed approach substantially improves the quality of generated layouts as well as the overall image quality. Our evaluation shows that we improve layout coverage by almost 20 points, and drop object overlap to negligible amounts. This leads to better image generation, relation fulfillment and objects quality.


page 3

page 4

page 6


Image Generation from Scene Graphs

To truly understand the visual world our models should be able not only ...

Using Scene Graph Context to Improve Image Generation

Generating realistic images from scene graphs asks neural networks to be...

Triplet-Aware Scene Graph Embeddings

Scene graphs have become an important form of structured knowledge for t...

Exploiting Relationship for Complex-scene Image Generation

The significant progress on Generative Adversarial Networks (GANs) has f...

Enriching StyleGAN with Illumination Physics

StyleGAN generates novel images of a scene from latent codes which are i...

CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graphs

Controllable scene synthesis aims to create interactive environments for...

LayoutBERT: Masked Language Layout Model for Object Insertion

Image compositing is one of the most fundamental steps in creative workf...

Please sign up or login with your details

Forgot password? Click here to reset