Tom Conerly

Chat Image Generator Video Music Voice Chat Photo Editor

Featured Co-authors

Samuel R. Bowman
79 publications
Ethan Perez
28 publications
Dario Amodei
27 publications
Stanislav Fort
25 publications
Jack Clark
22 publications
Dawn Drain
21 publications
Jared Kaplan
21 publications
Azalia Mirhoseini
20 publications
Sam McCandlish
19 publications
Tom Brown
17 publications
Amanda Askell
16 publications

research

∙ 12/15/2022

Constitutional AI: Harmlessness from AI Feedback

As AI systems become more capable, we would like to enlist their help to...

0 Yuntao Bai, et al. ∙

research

∙ 09/24/2022

In-context Learning and Induction Heads

"Induction heads" are attention heads that implement a simple algorithm ...

8 Catherine Olsson, et al. ∙

research

∙ 08/23/2022

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

We describe our early efforts to red team language models in order to si...

0 Deep Ganguli, et al. ∙

research

∙ 07/11/2022

Language Models (Mostly) Know What They Know

We study whether language models can evaluate the validity of their own ...

12 Saurav Kadavath, et al. ∙

research

∙ 05/21/2022

Scaling Laws and Interpretability of Learning from Repeated Data

Recent large language models have been trained on vast datasets, but als...

0 Danny Hernandez, et al. ∙

research

∙ 04/12/2022

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

We apply preference modeling and reinforcement learning from human feedb...

2 Yuntao Bai, et al. ∙

research

∙ 02/15/2022

Predictability and Surprise in Large Generative Models

Large-scale pre-training has recently emerged as a technique for creatin...

0 Deep Ganguli, et al. ∙

Success!

An error occurred

Tom Conerly

Featured Co-authors

Constitutional AI: Harmlessness from AI Feedback

In-context Learning and Induction Heads

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Language Models (Mostly) Know What They Know

Scaling Laws and Interpretability of Learning from Repeated Data

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Predictability and Surprise in Large Generative Models

Sign in with Google

Consider DeepAI Pro