Leverage Points in Modality Shifts: Comparing Language-only and Multimodal Word Representations

06/04/2023
by   Aleksey Tikhonov, et al.
0

Multimodal embeddings aim to enrich the semantic information in neural representations of language compared to text-only models. While different embeddings exhibit different applicability and performance on downstream tasks, little is known about the systematic representation differences attributed to the visual modality. Our paper compares word embeddings from three vision-and-language models (CLIP, OpenCLIP and Multilingual CLIP) and three text-only models, with static (FastText) as well as contextual representations (multilingual BERT; XLM-RoBERTa). This is the first large-scale study of the effect of visual grounding on language representations, including 46 semantic parameters. We identify meaning properties and relations that characterize words whose embeddings are most affected by the inclusion of visual modality in the training data; that is, points where visual grounding turns out most important. We find that the effect of visual modality correlates most with denotational semantic properties related to concreteness, but is also detected for several specific semantic classes, as well as for valence, a sentiment-related connotational property of linguistic expressions.

READ FULL TEXT
research
12/14/2016

Multilingual Word Embeddings using Multigraphs

We present a family of neural-network--inspired models for computing con...
research
11/20/2016

Visualizing Linguistic Shift

Neural network based models are a very powerful tool for creating word e...
research
02/22/2021

Probing Multimodal Embeddings for Linguistic Properties: the Visual-Semantic Case

Semantic embeddings have advanced the state of the art for countless nat...
research
06/07/2016

Multilingual Visual Sentiment Concept Matching

The impact of culture in visual emotion perception has recently captured...
research
11/15/2017

Investigating Inner Properties of Multimodal Representation and Semantic Compositionality with Brain-based Componential Semantics

Multimodal models have been proven to outperform text-based approaches o...
research
04/16/2021

Does language help generalization in vision models?

Vision models trained on multimodal datasets have recently proved very e...
research
05/25/2023

Not wacky vs. definitely wacky: A study of scalar adverbs in pretrained language models

Vector space models of word meaning all share the assumption that words ...

Please sign up or login with your details

Forgot password? Click here to reset