We test the hypothesis that language models trained with reinforcement
l...
As AI systems become more capable, we would like to enlist their help to...
Developing safe and useful general-purpose AI systems will require us to...
Neural networks often pack many unrelated concepts into a single neuron ...
We describe our early efforts to red team language models in order to
si...
We study whether language models can evaluate the validity of their own
...
We apply preference modeling and reinforcement learning from human feedb...
Large-scale pre-training has recently emerged as a technique for creatin...