Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs

05/09/2019
by   Max Simchowitz, et al.
0

This paper establishes that optimistic algorithms attain gap-dependent and non-asymptotic logarithmic regret for episodic MDPs. In contrast to prior work, our bounds do not suffer a dependence on diameter-like quantities or ergodicity, and smoothly interpolate between the gap dependent logarithmic-regret, and the O(√(HSAT))-minimax rate. The key technique in our analysis is a novel "clipped" regret decomposition which applies to a broad family of recent optimistic algorithms for episodic MDPs.

READ FULL TEXT
research
07/02/2021

Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning

We provide improved gap-dependent regret bounds for reinforcement learni...
research
07/01/2021

Gap-Dependent Bounds for Two-Player Markov Games

As one of the most popular methods in the field of reinforcement learnin...
research
10/09/2018

Adaptive Minimax Regret against Smooth Logarithmic Losses over High-Dimensional ℓ_1-Balls via Envelope Complexity

We develop a new theoretical framework, the envelope complexity, to anal...
research
01/31/2023

Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments

We study variance-dependent regret bounds for Markov decision processes ...
research
03/21/2020

A new regret analysis for Adam-type algorithms

In this paper, we focus on a theory-practice gap for Adam and its varian...
research
02/15/2012

Mirror Descent Meets Fixed Share (and feels no regret)

Mirror descent with an entropic regularizer is known to achieve shifting...
research
08/06/2018

Regret Bounds for Reinforcement Learning via Markov Chain Concentration

We give a simple optimistic algorithm for which it is easy to derive reg...

Please sign up or login with your details

Forgot password? Click here to reset