An Algorithmic Theory of Metacognition in Minds and Machines

by   Rylan Schaeffer, et al.
Stanford University

Humans sometimes choose actions that they themselves can identify as sub-optimal, or wrong, even in the absence of additional information. How is this possible? We present an algorithmic theory of metacognition based on a well-understood trade-off in reinforcement learning (RL) between value-based RL and policy-based RL. To the cognitive (neuro)science community, our theory answers the outstanding question of why information can be used for error detection but not for action selection. To the machine learning community, our proposed theory creates a novel interaction between the Actor and Critic in Actor-Critic agents and notes a novel connection between RL and Bayesian Optimization. We call our proposed agent the Metacognitive Actor Critic (MAC). We conclude with showing how to create metacognition in machines by implementing a deep MAC and showing that it can detect (some of) its own suboptimal actions without external information or delay.


Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

Learning optimal behavior from existing data is one of the most importan...

Reinforcement Learning Architectures: SAC, TAC, and ESAC

The trend is to implement intelligent agents capable of analyzing availa...

Advantage Actor-Critic with Reasoner: Explaining the Agent's Behavior from an Exploratory Perspective

Reinforcement learning (RL) is a powerful tool for solving complex decis...

A Human-Centered Data-Driven Planner-Actor-Critic Architecture via Logic Programming

Recent successes of Reinforcement Learning (RL) allow an agent to learn ...

PACMAN: A Planner-Actor-Critic Architecture for Human-Centered Planning and Learning

Conventional reinforcement learning (RL) allows an agent to learn polici...

AC-Teach: A Bayesian Actor-Critic Method for Policy Learning with an Ensemble of Suboptimal Teachers

The exploration mechanism used by a Deep Reinforcement Learning (RL) age...

Deconfounding Actor-Critic Network with Policy Adaptation for Dynamic Treatment Regimes

Despite intense efforts in basic and clinical research, an individualize...

Please sign up or login with your details

Forgot password? Click here to reset