A Note on Target Q-learning For Solving Finite MDPs with A Generative Oracle

03/22/2022
by   Ziniu Li, et al.
10

Q-learning with function approximation could diverge in the off-policy setting and the target network is a powerful technique to address this issue. In this manuscript, we examine the sample complexity of the associated target Q-learning algorithm in the tabular case with a generative oracle. We point out a misleading claim in [Lee and He, 2020] and establish a tight analysis. In particular, we demonstrate that the sample complexity of the target Q-learning algorithm in [Lee and He, 2020] is 𝒪(|𝒮|^2|𝒜|^2 (1-γ)^-5ε^-2). Furthermore, we show that this sample complexity is improved to 𝒪(|𝒮||𝒜| (1-γ)^-5ε^-2) if we can sequentially update all state-action pairs and 𝒪(|𝒮||𝒜| (1-γ)^-4ε^-2) if γ is further in (1/2, 1). Compared with the vanilla Q-learning, our results conclude that the introduction of a periodically-frozen target Q-function does not sacrifice the sample complexity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/05/2022

Target Network and Truncation Overcome The Deadly triad in Q-Learning

Q-learning with function approximation is one of the most empirically su...
research
09/28/2020

Best Policy Identification in discounted MDPs: Problem-specific Sample Complexity

We investigate the problem of best-policy identification in discounted M...
research
08/14/2020

On the Sample Complexity of Super-Resolution Radar

We point out an issue with Lemma 8.6 of [1]. This lemma specifies the re...
research
09/07/2023

Gradient-Based Feature Learning under Structured Data

Recent works have demonstrated that the sample complexity of gradient-ba...
research
10/27/2021

Provable Lifelong Learning of Representations

In lifelong learning, the tasks (or classes) to be learned arrive sequen...
research
02/23/2020

Periodic Q-Learning

The use of target networks is a common practice in deep reinforcement le...
research
02/04/2020

Learning bounded subsets of L_p

We study learning problems in which the underlying class is a bounded su...

Please sign up or login with your details

Forgot password? Click here to reset