Schedule Based Temporal Difference Algorithms
Learning the value function of a given policy from data samples is an important problem in Reinforcement Learning. TD(λ) is a popular class of algorithms to solve this problem. However, the weights assigned to different n-step returns in TD(λ), controlled by the parameter λ, decrease exponentially with increasing n. In this paper, we present a λ-schedule procedure that generalizes the TD(λ) algorithm to the case when the parameter λ could vary with time-step. This allows flexibility in weight assignment, i.e., the user can specify the weights assigned to different n-step returns by choosing a sequence {λ_t}_t ≥ 1. Based on this procedure, we propose an on-policy algorithm - TD(λ)-schedule, and two off-policy algorithms - GTD(λ)-schedule and TDC(λ)-schedule, respectively. We provide proofs of almost sure convergence for all three algorithms under a general Markov noise framework.
READ FULL TEXT