AlphaZero is a self-play reinforcement learning algorithm that achieves
...
The reward hypothesis posits that, "all of what we mean by goals and pur...
While it is known that communication facilitates cooperation in multi-ag...
Neural replicator dynamics (NeuRD) is an alternative to the foundational...
Hindsight rationality is an approach to playing general-sum games that
p...
Model-based Reinforcement Learning (MBRL) holds promise for data-efficie...
Games have a long history of serving as a benchmark for progress in
arti...
We introduce the partially observable history process (POHP) formalism f...
A key challenge in the field of reinforcement learning is to develop age...
Hindsight rationality is an approach to playing multi-agent, general-sum...
For artificially intelligent learning systems to have widespread
applica...
Driven by recent successes in two-player, zero-sum game solving and play...
Reinforcement learning is a powerful learning paradigm in which agents c...
Regret minimization has played a key role in online learning, equilibriu...
Search has played a fundamental role in computer game research since the...
Sample-based planning is a powerful family of algorithms for generating
...
Human-computer interactive systems that rely on machine learning are bec...
A common metric in games of imperfect information is exploitability, i.e...
Function approximation is a powerful approach for structuring large deci...
Extensive-form games (EFGs) are a common model of multi-agent interactio...
Multiagent decision-making problems in partially observable environments...
Artificial agents have been shown to learn to communicate when needed to...
From the early days of computing, games have been important testbeds for...
When observing the actions of others, humans carry out inferences about ...
Optimization of parameterized policies for reinforcement learning (RL) i...
Deep reinforcement learning (RL) algorithms have shown an impressive abi...
Extensive-form games are a common model for multiagent interactions with...
Learning strategies for imperfect information games from samples of
inte...
The problem of exploration in reinforcement learning is well-understood ...
Dyna is an architecture for reinforcement learning agents that interleav...
Artificial intelligence has seen several breakthroughs in recent years, ...
Evaluating agent performance when outcomes are stochastic and agents use...
We propose a novel online learning method for minimizing regret in large...
In Reinforcement Learning (RL), it is common to use optimistic initializ...
This paper introduces the Partition Tree Weighting technique, an efficie...
In this article we introduce the Arcade Learning Environment (ALE): both...
Online learning aims to perform nearly as well as the best hypothesis in...
Counterfactual Regret Minimization (CFR) is an efficient no-regret learn...
We consider the problem of simultaneously learning to linearly combine a...
The success of kernel-based learning methods depend on the choice of ker...