Entity Tracking in Language Models

by   Najoung Kim, et al.

Keeping track of how states and relations of entities change as a text or dialog unfolds is a key prerequisite to discourse understanding. Despite this fact, there have been few systematic investigations into the ability of large language models (LLMs) to track discourse entities. In this work, we present a task to probe to what extent a language model can infer the final state of an entity given an English description of the initial state and a series of state-changing operations. We use this task to first investigate whether Flan-T5, GPT-3 and GPT-3.5 can track the state of entities, and find that only GPT-3.5 models, which have been pretrained on large amounts of code, exhibit this ability. We then investigate whether smaller models pretrained primarily on text can learn to track entities, through finetuning T5 on several training/evaluation splits. While performance degrades for more complex splits, we find that even for splits with almost no lexical overlap between training and evaluation, a finetuned model can often perform non-trivial entity tracking. Taken together, these results suggest that language models can learn to track entities but pretraining on large text corpora alone does not make this capacity surface.


page 1

page 2

page 3

page 4


When a sentence does not introduce a discourse entity, Transformer-based models still sometimes refer to it

Understanding longer narratives or participating in conversations requir...

Implicit Representations of Meaning in Neural Language Models

Does the effectiveness of neural language models derive entirely from ac...

PeTra: A Sparsely Supervised Memory Model for People Tracking

We propose PeTra, a memory-augmented neural network designed to track en...

Efficient and Interpretable Neural Models for Entity Tracking

What would it take for a natural language model to understand a novel, s...

Seeing past words: Testing the cross-modal capabilities of pretrained V L models

We investigate the ability of general-purpose pretrained vision and lang...

Spelling convention sensitivity in neural language models

We examine whether large neural language models, trained on very large c...

OpenPI-C: A Better Benchmark and Stronger Baseline for Open-Vocabulary State Tracking

Open-vocabulary state tracking is a more practical version of state trac...

Please sign up or login with your details

Forgot password? Click here to reset