BERTnesia: Investigating the capture and forgetting of knowledge in BERT

10/19/2020
by   Jonas Wallat, et al.
9

Probing complex language models has recently revealed several insights into linguistic and semantic patterns found in the learned representations. In this paper, we probe BERT specifically to understand and measure the relational knowledge it captures. We utilize knowledge base completion tasks to probe every layer of pre-trained as well as fine-tuned BERT (ranking, question answering, NER). Our findings show that knowledge is not just contained in BERT's final layers. Intermediate layers contribute a significant amount (17-60 how different types of knowledge emerge at varying rates. When BERT is fine-tuned, relational knowledge is forgotten but the extent of forgetting is impacted by the fine-tuning objective but not the size of the dataset. We found that ranking models forget the least and retain more knowledge in their final layer. We release our code on github to repeat the experiments.

READ FULL TEXT
research
11/08/2019

What Would Elsa Do? Freezing Layers During Transformer Fine-Tuning

Pretrained transformer-based language models have achieved state of the ...
research
06/02/2020

A Pairwise Probe for Understanding BERT Fine-Tuning on Machine Reading Comprehension

Pre-trained models have brought significant improvements to many NLP tas...
research
09/17/2021

Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers

Despite the success of fine-tuning pretrained language encoders like BER...
research
10/12/2021

ALL Dolphins Are Intelligent and SOME Are Friendly: Probing BERT for Nouns' Semantic Properties and their Prototypicality

Large scale language models encode rich commonsense knowledge acquired t...
research
05/05/2019

Investigating the Successes and Failures of BERT for Passage Re-Ranking

The bidirectional encoder representations from transformers (BERT) model...
research
04/16/2021

Editing Factual Knowledge in Language Models

The factual knowledge acquired during pretraining and stored in the para...
research
04/04/2023

Can BERT eat RuCoLA? Topological Data Analysis to Explain

This paper investigates how Transformer language models (LMs) fine-tuned...

Please sign up or login with your details

Forgot password? Click here to reset