Is the Red Square Big? MALeViC: Modeling Adjectives Leveraging Visual Contexts

08/27/2019
by   Sandro Pezzelle, et al.
2

This work aims at modeling how the meaning of gradable adjectives of size (`big', `small') can be learned from visually-grounded contexts. Inspired by cognitive and linguistic evidence showing that the use of these expressions relies on setting a threshold that is dependent on a specific context, we investigate the ability of multi-modal models in assessing whether an object is `big' or `small' in a given visual scene. In contrast with the standard computational approach that simplistically treats gradable adjectives as `fixed' attributes, we pose the problem as relational: to be successful, a model has to consider the full visual context. By means of four main tasks, we show that state-of-the-art models (but not a relatively strong baseline) can learn the function subtending the meaning of size adjectives, though their performance is found to decrease while moving from simple to more complex tasks. Crucially, models fail in developing abstract representations of gradable adjectives that can be used compositionally.

READ FULL TEXT

page 2

page 3

page 8

page 9

research
11/15/2022

Pragmatics in Grounded Language Learning: Phenomena, Tasks, and Modeling Approaches

People rely heavily on context to enrich meaning beyond what is literall...
research
08/16/2023

Learning the meanings of function words from grounded language using a visual question answering model

Interpreting a seemingly-simple function word like "or", "behind", or "m...
research
10/06/2021

Efficient Multi-Modal Embeddings from Structured Data

Multi-modal word semantics aims to enhance embeddings with perceptual in...
research
06/20/2015

Aligning where to see and what to tell: image caption with region-based attention and scene factorization

Recent progress on automatic generation of image captions has shown that...
research
10/13/2017

A Learning Based Approach to Incremental Context Modeling in Robots

There have been several attempts at modeling context in robots. However,...
research
09/14/2023

VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue

Visually-grounded dialog systems, which integrate multiple modes of comm...
research
11/29/2022

Abstract Visual Reasoning with Tangram Shapes

We introduce KiloGram, a resource for studying abstract visual reasoning...

Please sign up or login with your details

Forgot password? Click here to reset