How is BERT surprised? Layerwise detection of linguistic anomalies

05/16/2021
by   Bai Li, et al.
0

Transformer language models have shown remarkable ability in detecting when a word is anomalous in context, but likelihood scores offer no information about the cause of the anomaly. In this work, we use Gaussian models for density estimation at intermediate layers of three language models (BERT, RoBERTa, and XLNet), and evaluate our method on BLiMP, a grammaticality judgement benchmark. In lower layers, surprisal is highly correlated to low token frequency, but this correlation diminishes in upper layers. Next, we gather datasets of morphosyntactic, semantic, and commonsense anomalies from psycholinguistic studies; we find that the best performing model RoBERTa exhibits surprisal in earlier layers when the anomaly is morphosyntactic than when it is semantic, while commonsense anomalies do not exhibit surprisal at any intermediate layer. These results suggest that language models employ separate mechanisms to detect different types of linguistic anomalies.

READ FULL TEXT

page 1

page 13

research
07/20/2022

Integrating Linguistic Theory and Neural Language Models

Transformer-based language models have recently achieved remarkable resu...
research
08/10/2020

Does BERT Solve Commonsense Task via Commonsense Knowledge?

The success of pre-trained contextualized language models such as BERT m...
research
08/02/2023

LLMs Understand Glass-Box Models, Discover Surprises, and Suggest Repairs

We show that large language models (LLMs) are remarkably good at working...
research
04/25/2023

What does BERT learn about prosody?

Language models have become nearly ubiquitous in natural language proces...
research
11/12/2021

Variation and generality in encoding of syntactic anomaly information in sentence embeddings

While sentence anomalies have been applied periodically for testing in N...
research
08/29/2023

AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models

Large Vision-Language Models (LVLMs) such as MiniGPT-4 and LLaVA have de...
research
09/06/2022

Transfer Learning of Lexical Semantic Families for Argumentative Discourse Units Identification

Argument mining tasks require an informed range of low to high complexit...

Please sign up or login with your details

Forgot password? Click here to reset