Context vs Target Word: Quantifying Biases in Lexical Semantic Datasets

by   Qianchu Liu, et al.

State-of-the-art contextualized models such as BERT use tasks such as WiC and WSD to evaluate their word-in-context representations. This inherently assumes that performance in these tasks reflect how well a model represents the coupled word and context semantics. This study investigates this assumption by presenting the first quantitative analysis (using probing baselines) on the context-word interaction being tested in major contextual lexical semantic tasks. Specifically, based on the probing baseline performance, we propose measures to calculate the degree of context or word biases in a dataset, and plot existing datasets on a continuum. The analysis shows most existing datasets fall into the extreme ends of the continuum (i.e. they are either heavily context-biased or target-word-biased) while only AM^2iCo and Sense Retrieval challenge a model to represent both the context and target words. Our case study on WiC reveals that human subjects do not share models' strong context biases in the dataset (humans found semantic judgments much more difficult when the target word is missing) and models are learning spurious correlations from context alone. This study demonstrates that models are usually not being tested for word-in-context representations as such in these tasks and results are therefore open to misinterpretation. We recommend our framework as sanity check for context and target word biases of future task design and application in lexical semantics.


page 1

page 2

page 3

page 4


Always Keep your Target in Mind: Studying Semantics and Improving Performance of Neural Lexical Substitution

Lexical substitution, i.e. generation of plausible words that can replac...

Exploring BERT's Sensitivity to Lexical Cues using Tests from Semantic Priming

Models trained to estimate word probabilities in context have become ubi...

Exploring the Representation of Word Meanings in Context: A Case Study on Homonymy and Synonymy

This paper presents a multilingual study of word meaning representations...

Swords: A Benchmark for Lexical Substitution with Improved Data Coverage and Quality

We release a new benchmark for lexical substitution, the task of finding...

A Critique of a Critique of Word Similarity Datasets: Sanity Check or Unnecessary Confusion?

Critical evaluation of word similarity datasets is very important for co...

The concept "altruism" for sociological research: from conceptualization to operationalization

This article addresses the question of the relevant conceptualization of...

Evaluating language-biased image classification based on semantic representations

Humans show language-biased image recognition for a word-embedded image,...

Please sign up or login with your details

Forgot password? Click here to reset