Reward-Weighted Regression Converges to a Global Optimum

07/19/2021
by   Miroslav Štrupl, et al.
15

Reward-Weighted Regression (RWR) belongs to a family of widely known iterative Reinforcement Learning algorithms based on the Expectation-Maximization framework. In this family, learning at each iteration consists of sampling a batch of trajectories using the current policy and fitting a new policy to maximize a return-weighted log-likelihood of actions. Although RWR is known to yield monotonic improvement of the policy under certain circumstances, whether and under which conditions RWR converges to the optimal policy have remained open questions. In this paper, we provide for the first time a proof that RWR converges to a global optimum when no function approximation is used.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2020

Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling

Despite the wide applications of Adam in reinforcement learning (RL), th...
research
10/26/2021

Hinge Policy Optimization: Rethinking Policy Improvement and Reinterpreting PPO

Policy optimization is a fundamental principle for designing reinforceme...
research
01/18/2023

DIRECT: Learning from Sparse and Shifting Rewards using Discriminative Reward Co-Training

We propose discriminative reward co-training (DIRECT) as an extension to...
research
04/22/2020

Per-Step Reward: A New Perspective for Risk-Averse Reinforcement Learning

We present a new per-step reward perspective for risk-averse control in ...
research
05/24/2019

Neural Temporal-Difference Learning Converges to Global Optima

Temporal-difference learning (TD), coupled with neural networks, is amon...
research
03/13/2023

Path Planning using Reinforcement Learning: A Policy Iteration Approach

With the impact of real-time processing being realized in the recent pas...
research
11/28/2017

Hierarchical Policy Search via Return-Weighted Density Estimation

Learning an optimal policy from a multi-modal reward function is a chall...

Please sign up or login with your details

Forgot password? Click here to reset