Language Models are Few-Shot Learners

by   Tom B. Brown, et al.

Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.


page 1

page 2

page 3

page 4


Making Pre-trained Language Models Better Few-shot Learners

The recent GPT-3 model (Brown et al., 2020) achieves remarkable few-shot...

Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining

Pre-trained neural language models bring significant improvement for var...

A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

Generative Large Language Models (LLMs) have achieved remarkable advance...

Gradient Ascent Post-training Enhances Language Model Generalization

In this work, we empirically show that updating pretrained LMs (350M, 1....

TART: A plug-and-play Transformer module for task-agnostic reasoning

Large language models (LLMs) exhibit in-context learning abilities which...

TruthfulQA: Measuring How Models Mimic Human Falsehoods

We propose a benchmark to measure whether a language model is truthful i...

Self Information Update for Large Language Models through Mitigating Exposure Bias

Current LLMs have demonstrated remarkable capabilities in addressing use...

Please sign up or login with your details

Forgot password? Click here to reset