We test the hypothesis that language models trained with reinforcement
l...
As AI systems become more capable, we would like to enlist their help to...
Developing safe and useful general-purpose AI systems will require us to...
"Induction heads" are attention heads that implement a simple algorithm ...
Neural networks often pack many unrelated concepts into a single neuron ...
We describe our early efforts to red team language models in order to
si...
We study whether language models can evaluate the validity of their own
...
Recent large language models have been trained on vast datasets, but als...
We apply preference modeling and reinforcement learning from human feedb...
Large-scale pre-training has recently emerged as a technique for creatin...
Given the broad capabilities of large language models, it should be poss...
We introduce Codex, a GPT language model fine-tuned on publicly availabl...
We identify empirical scaling laws for the cross-entropy loss in four
do...
As language models become more powerful, training and evaluation are
inc...
Recent work has demonstrated substantial gains on many NLP tasks and
ben...
We study empirical scaling laws for language model performance on the
cr...
Reward learning enables the application of reinforcement learning (RL) t...
In an increasing number of domains it has been demonstrated that deep
le...
To solve complex real-world problems with reinforcement learning, we can...
Many real world learning tasks involve complex or hard-to-specify object...
We explore methods for option discovery based on variational inference a...
To make AI systems broadly useful for challenging real-world tasks, we n...
This report surveys the landscape of potential security threats from
mal...
For sophisticated reinforcement learning (RL) systems to interact useful...
Learning a natural language interface for database tables is a challengi...
Rapid progress in machine learning and artificial intelligence (AI) has
...
We show that an end-to-end deep learning approach can be used to recogni...