The Sensitivity of Language Models and Humans to Winograd Schema Perturbations

05/04/2020
by   Mostafa Abdou, et al.
0

Large-scale pretrained language models are the major driving force behind recent improvements in performance on the Winograd Schema Challenge, a widely employed test of common sense reasoning ability. We show, however, with a new diagnostic dataset, that these models are sensitive to linguistic perturbations of the Winograd examples that minimally affect human understanding. Our results highlight interesting differences between humans and language models: language models are more sensitive to a number or gender alternations and synonym replacements than humans, and humans are more stable and consistent in their predictions, maintain a much higher absolute performance, and perform better on non-associative instances than associative ones. Overall, humans are correct more often than out-of-the-box models, and the models are sometimes right for the wrong reasons. Finally, we show that fine-tuning on a large, task-specific dataset can offer a solution to these issues.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2019

A Surprisingly Robust Trick for Winograd Schema Challenge

The Winograd Schema Challenge (WSC) dataset WSC273 and its inference cou...
research
09/02/2021

So Cloze yet so Far: N400 Amplitude is Better Predicted by Distributional Information than Human Predictability Judgements

More predictable words are easier to process - they are read faster and ...
research
11/28/2022

Scientific and Creative Analogies in Pretrained Language Models

This paper examines the encoding of analogy in large-scale pretrained la...
research
05/24/2023

Towards Reliable Misinformation Mitigation: Generalization, Uncertainty, and GPT-4

Misinformation poses a critical societal challenge, and current approach...
research
03/30/2023

Humans in Humans Out: On GPT Converging Toward Common Sense in both Success and Failure

Increase in computational scale and fine-tuning has seen a dramatic impr...
research
07/19/2023

Generating Mathematical Derivations with Large Language Models

The derivation of mathematical results in specialised fields using Large...
research
06/11/2023

Inductive reasoning in humans and large language models

The impressive recent performance of large language models has led many ...

Please sign up or login with your details

Forgot password? Click here to reset