Dario Amodei

research

∙ 02/15/2023

The Capacity for Moral Self-Correction in Large Language Models

We test the hypothesis that language models trained with reinforcement l...

0 Deep Ganguli, et al. ∙

research

∙ 12/15/2022

Constitutional AI: Harmlessness from AI Feedback

As AI systems become more capable, we would like to enlist their help to...

0 Yuntao Bai, et al. ∙

research

∙ 11/04/2022

Measuring Progress on Scalable Oversight for Large Language Models

Developing safe and useful general-purpose AI systems will require us to...

0 Samuel R. Bowman, et al. ∙

research

∙ 09/24/2022

In-context Learning and Induction Heads

"Induction heads" are attention heads that implement a simple algorithm ...

8 Catherine Olsson, et al. ∙

research

∙ 09/21/2022

Toy Models of Superposition

Neural networks often pack many unrelated concepts into a single neuron ...

12 Nelson Elhage, et al. ∙

research

∙ 08/23/2022

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

We describe our early efforts to red team language models in order to si...

0 Deep Ganguli, et al. ∙

research

∙ 07/11/2022

Language Models (Mostly) Know What They Know

We study whether language models can evaluate the validity of their own ...

12 Saurav Kadavath, et al. ∙

research

∙ 05/21/2022

Scaling Laws and Interpretability of Learning from Repeated Data

Recent large language models have been trained on vast datasets, but als...

0 Danny Hernandez, et al. ∙

research

∙ 04/12/2022

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

We apply preference modeling and reinforcement learning from human feedb...

2 Yuntao Bai, et al. ∙

research

∙ 02/15/2022

Predictability and Surprise in Large Generative Models

Large-scale pre-training has recently emerged as a technique for creatin...

0 Deep Ganguli, et al. ∙

research

∙ 12/01/2021

A General Language Assistant as a Laboratory for Alignment

Given the broad capabilities of large language models, it should be poss...

11 Amanda Askell, et al. ∙

research

∙ 07/07/2021

Evaluating Large Language Models Trained on Code

We introduce Codex, a GPT language model fine-tuned on publicly availabl...

6 Mark Chen, et al. ∙

research

∙ 10/28/2020

Scaling Laws for Autoregressive Generative Modeling

We identify empirical scaling laws for the cross-entropy loss in four do...

2 Tom Henighan, et al. ∙

research

∙ 09/02/2020

Learning to summarize from human feedback

As language models become more powerful, training and evaluation are inc...

68 Nisan Stiennon, et al. ∙

research

∙ 05/28/2020

Language Models are Few-Shot Learners

Recent work has demonstrated substantial gains on many NLP tasks and ben...

34 Tom B. Brown, et al. ∙

research

∙ 01/23/2020

Scaling Laws for Neural Language Models

We study empirical scaling laws for language model performance on the cr...

0 Jared Kaplan, et al. ∙

research

∙ 09/18/2019

Fine-Tuning Language Models from Human Preferences

Reward learning enables the application of reinforcement learning (RL) t...

0 Daniel M. Ziegler, et al. ∙

research

∙ 12/14/2018

An Empirical Model of Large-Batch Training

In an increasing number of domains it has been demonstrated that deep le...

0 Sam McCandlish, et al. ∙

research

∙ 11/15/2018

Reward learning from human preferences and demonstrations in Atari

To solve complex real-world problems with reinforcement learning, we can...

6 Borja Ibarz, et al. ∙

research

∙ 10/19/2018

Supervising strong learners by amplifying weak experts

Many real world learning tasks involve complex or hard-to-specify object...

0 Paul Christiano, et al. ∙

research

∙ 07/26/2018

Variational Option Discovery Algorithms

We explore methods for option discovery based on variational inference a...

0 Joshua Achiam, et al. ∙

research

∙ 05/02/2018

AI safety via debate

To make AI systems broadly useful for challenging real-world tasks, we n...

0 Geoffrey Irving, et al. ∙

research

∙ 02/20/2018

The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation

This report surveys the landscape of potential security threats from mal...

3 Miles Brundage, et al. ∙

research

∙ 06/12/2017

Deep reinforcement learning from human preferences

For sophisticated reinforcement learning (RL) systems to interact useful...

0 Paul Christiano, et al. ∙

research

∙ 11/28/2016

Learning a Natural Language Interface with Neural Programmer

Learning a natural language interface for database tables is a challengi...

0 Arvind Neelakantan, et al. ∙

research

∙ 06/21/2016

Concrete Problems in AI Safety

Rapid progress in machine learning and artificial intelligence (AI) has ...

0 Dario Amodei, et al. ∙

research

∙ 12/08/2015

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

We show that an end-to-end deep learning approach can be used to recogni...

0 Dario Amodei, et al. ∙

Dario Amodei

Featured Co-authors

Sign in with Google

Consider DeepAI Pro