Finite-Time Analysis of Entropy-Regularized Neural Natural Actor-Critic Algorithm

06/02/2022
by   Semih Cayci, et al.
0

Natural actor-critic (NAC) and its variants, equipped with the representation power of neural networks, have demonstrated impressive empirical success in solving Markov decision problems with large state spaces. In this paper, we present a finite-time analysis of NAC with neural network approximation, and identify the roles of neural networks, regularization and optimization techniques (e.g., gradient clipping and averaging) to achieve provably good performance in terms of sample complexity, iteration complexity and overparametrization bounds for the actor and the critic. In particular, we prove that (i) entropy regularization and averaging ensure stability by providing sufficient exploration to avoid near-deterministic and strictly suboptimal policies and (ii) regularization leads to sharp sample complexity and network width bounds in the regularized MDPs, yielding a favorable bias-variance tradeoff in policy optimization. In the process, we identify the importance of uniform approximation power of the actor neural network to achieve global optimality in policy optimization due to distributional shift.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2023

On the Global Convergence of Natural Actor-Critic with Two-layer Neural Network Parametrization

Actor-critic algorithms have shown remarkable success in solving state-o...
research
05/26/2021

Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation

In this paper, we develop a novel variant of off-policy natural actor-cr...
research
07/13/2019

Stochastic Convergence Results for Regularized Actor-Critic Methods

In this paper, we present a stochastic convergence proof, under suitable...
research
10/21/2021

Actor-critic is implicitly biased towards high entropy optimal policies

We show that the simplest actor-critic method – a linear softmax policy ...
research
02/18/2021

Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm

In this paper, we provide finite-sample convergence guarantees for an of...
research
12/27/2021

Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic

Actor-critic (AC) algorithms, empowered by neural networks, have had sig...
research
06/20/2023

Provably Robust Temporal Difference Learning for Heavy-Tailed Rewards

In a broad class of reinforcement learning applications, stochastic rewa...

Please sign up or login with your details

Forgot password? Click here to reset