Not wacky vs. definitely wacky: A study of scalar adverbs in pretrained language models

by   Isabelle Lorge, et al.

Vector space models of word meaning all share the assumption that words occurring in similar contexts have similar meanings. In such models, words that are similar in their topical associations but differ in their logical force tend to emerge as semantically close, creating well-known challenges for NLP applications that involve logical reasoning. Modern pretrained language models, such as BERT, RoBERTa and GPT-3 hold the promise of performing better on logical tasks than classic static word embeddings. However, reports are mixed about their success. In the current paper, we advance this discussion through a systematic study of scalar adverbs, an under-explored class of words with strong logical force. Using three different tasks, involving both naturalistic social media data and constructed examples, we investigate the extent to which BERT, RoBERTa, GPT-2 and GPT-3 exhibit general, human-like, knowledge of these common words. We ask: 1) Do the models distinguish amongst the three semantic categories of MODALITY, FREQUENCY and DEGREE? 2) Do they have implicit representations of full scales from maximally negative to maximally positive? 3) How do word frequency and contextual factors impact model performance? We find that despite capturing some aspects of logical meaning, the models fall far short of human performance.


page 7

page 14

page 15

page 16


BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance

Pretraining deep contextualized representations using an unsupervised la...

Quantifying the Contextualization of Word Representations with Semantic Class Probing

Pretrained language models have achieved a new state of the art on many ...

Do Language Embeddings Capture Scales?

Pretrained Language Models (LMs) have been shown to possess significant ...

Tsetlin Machine Embedding: Representing Words Using Logical Expressions

Embedding words in vector space is a fundamental first step in state-of-...

Leverage Points in Modality Shifts: Comparing Language-only and Multimodal Word Representations

Multimodal embeddings aim to enrich the semantic information in neural r...

Modelling General Properties of Nouns by Selectively Averaging Contextualised Embeddings

While the success of pre-trained language models has largely eliminated ...

exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models

Large language models can produce powerful contextual representations th...

Please sign up or login with your details

Forgot password? Click here to reset