A Version of Geiringer-like Theorem for Decision Making in the Environments with Randomness and Incomplete Information

10/20/2011
by   Boris Mitavskiy, et al.
0

Purpose: In recent years Monte-Carlo sampling methods, such as Monte Carlo tree search, have achieved tremendous success in model free reinforcement learning. A combination of the so called upper confidence bounds policy to preserve the "exploration vs. exploitation" balance to select actions for sample evaluations together with massive computing power to store and to update dynamically a rather large pre-evaluated game tree lead to the development of software that has beaten the top human player in the game of Go on a 9 by 9 board. Much effort in the current research is devoted to widening the range of applicability of the Monte-Carlo sampling methodology to partially observable Markov decision processes with non-immediate payoffs. The main challenge introduced by randomness and incomplete information is to deal with the action evaluation at the chance nodes due to drastic differences in the possible payoffs the same action could lead to. The aim of this article is to establish a version of a theorem that originated from population genetics and has been later adopted in evolutionary computation theory that will lead to novel Monte-Carlo sampling algorithms that provably increase the AI potential. Due to space limitations the actual algorithms themselves will be presented in the sequel papers, however, the current paper provides a solid mathematical foundation for the development of such algorithms and explains why they are so promising.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/11/2013

A Further Generalization of the Finite-Population Geiringer-like Theorem for POMDPs to Allow Recombination Over Arbitrary Set Covers

A popular current research trend deals with expanding the Monte-Carlo tr...
research
05/11/2013

Geiringer Theorems: From Population Genetics to Computational Intelligence, Memory Evolutive Systems and Hebbian Learning

The classical Geiringer theorem addresses the limiting frequency of occu...
research
06/02/2022

Policy Gradient Algorithms with Monte-Carlo Tree Search for Non-Markov Decision Processes

Policy gradient (PG) is a reinforcement learning (RL) approach that opti...
research
06/14/2018

Learning in POMDPs with Monte Carlo Tree Search

The POMDP is a powerful framework for reasoning under outcome and inform...
research
12/17/2021

On the Evolution of the MCTS Upper Confidence Bounds for Trees by Means of Evolutionary Algorithms in the Game of Carcassonne

Monte Carlo Tree Search (MCTS) is a sampling best-first method to search...
research
03/13/2018

Fractal AI: A fragile theory of intelligence

Fractal AI is a theory for general artificial intelligence. It allows to...
research
03/15/2012

Understanding Sampling Style Adversarial Search Methods

UCT has recently emerged as an exciting new adversarial reasoning techni...

Please sign up or login with your details

Forgot password? Click here to reset