DeepAI AI Chat
Log In Sign Up

Are Emergent Abilities of Large Language Models a Mirage?

by   Rylan Schaeffer, et al.

Recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models. What makes emergent abilities intriguing is two-fold: their sharpness, transitioning seemingly instantaneously from not present to present, and their unpredictability, appearing at seemingly unforeseeable model scales. Here, we present an alternative explanation for emergent abilities: that for a particular task and model family, when analyzing fixed model outputs, one can choose a metric which leads to the inference of an emergent ability or another metric which does not. Thus, our alternative suggests that existing claims of emergent abilities are creations of the researcher's analyses, not fundamental changes in model behavior on specific tasks with scale. We present our explanation in a simple mathematical model, then test it in three complementary ways: we (1) make, test and confirm three predictions on the effect of metric choice using the InstructGPT/GPT-3 family on tasks with claimed emergent abilities, (2) make, test and confirm two predictions about metric choices in a meta-analysis of emergent abilities on BIG-Bench; and (3) show how similar metric decisions suggest apparent emergent abilities on vision tasks in diverse deep network architectures (convolutional, autoencoder, transformers). In all three analyses, we find strong supporting evidence that emergent abilities may not be a fundamental property of scaling AI models.


page 1

page 2

page 3

page 4


Are Emergent Abilities in Large Language Models just In-Context Learning?

Large language models have exhibited emergent abilities, demonstrating e...

Large Linguistic Models: Analyzing theoretical linguistic abilities of LLMs

The performance of large language models (LLMs) has recently improved to...

Specializing Smaller Language Models towards Multi-Step Reasoning

The surprising ability of Large Language Models (LLMs) to perform well o...

Better Question-Answering Models on a Budget

Low-rank adaptation (LoRA) and question-answer datasets from large langu...

What's the Meaning of Superhuman Performance in Today's NLU?

In the last five years, there has been a significant focus in Natural La...

Measuring Progress on Scalable Oversight for Large Language Models

Developing safe and useful general-purpose AI systems will require us to...

How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model

Pre-trained language models can be surprisingly adept at tasks they were...