Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes

11/22/2015
by   Satwik Kottur, et al.
0

We propose a model to learn visually grounded word embeddings (vis-w2v) to capture visual notions of semantic relatedness. While word embeddings trained using text have been extremely successful, they cannot uncover notions of semantic relatedness implicit in our visual world. For instance, although "eats" and "stares at" seem unrelated in text, they share semantics visually. When people are eating something, they also tend to stare at the food. Grounding diverse relations like "eats" and "stares at" into vision remains challenging, despite recent progress in vision. We note that the visual grounding of words depends on semantics, and not the literal pixels. We thus use abstract scenes created from clipart to provide the visual grounding. We find that the embeddings we learn capture fine-grained, visually grounded notions of semantic relatedness. We show improvements over text-only word embeddings (word2vec) on three tasks: common-sense assertion classification, visual paraphrasing and text-based image retrieval. Our code and datasets are available online.

READ FULL TEXT

page 1

page 13

research
06/17/2022

Language with Vision: a Study on Grounded Word and Sentence Embeddings

Language grounding to vision is an active field of research aiming to en...
research
02/21/2022

Seeing the advantage: visually grounding word embeddings to better capture human semantic knowledge

Distributional semantic models capture word-level meaning that is useful...
research
08/22/2019

ViCo: Word Embeddings from Visual Co-occurrences

We propose to learn word embeddings from visual co-occurrences. Two word...
research
06/30/2011

Grounded Semantic Composition for Visual Scenes

We present a visually-grounded language understanding model based on a s...
research
09/08/2022

Visual Grounding of Inter-lingual Word-Embeddings

Visual grounding of Language aims at enriching textual representations o...
research
09/29/2021

Visually Grounded Concept Composition

We investigate ways to compose complex concepts in texts from primitive ...
research
03/19/2019

Trick or TReAT: Thematic Reinforcement for Artistic Typography

An approach to make text visually appealing and memorable is semantic re...

Please sign up or login with your details

Forgot password? Click here to reset