The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning

10/27/2021
by   Vivek Borkar, et al.
16

The paper concerns convergence and asymptotic statistics for stochastic approximation driven by Markovian noise: θ_n+1= θ_n + α_n + 1 f(θ_n, Φ_n+1) , n≥ 0, in which each θ_n∈^d, {Φ_n } is a Markov chain on a general state space X with stationary distribution π, and f:^d×X→^d. In addition to standard Lipschitz bounds on f, and conditions on the vanishing step-size sequence {α_n}, it is assumed that the associated ODE is globally asymptotically stable with stationary point denoted θ^*, where f̅(θ)=E[f(θ,Φ)] with Φ∼π. Moreover, the ODE@∞ defined with respect to the vector field, f̅_∞(θ):= lim_r→∞ r^-1f̅(rθ) , θ∈^d, is asymptotically stable. The main contributions are summarized as follows: (i) The sequence θ is convergent if Φ is geometrically ergodic, and subject to compatible bounds on f. The remaining results are established under a stronger assumption on the Markov chain: A slightly weaker version of the Donsker-Varadhan Lyapunov drift condition known as (DV3). (ii) A Lyapunov function is constructed for the joint process {θ_n,Φ_n} that implies convergence of {θ_n} in L_4. (iii) A functional CLT is established, as well as the usual one-dimensional CLT for the normalized error z_n:= (θ_n-θ^*)/√(α_n). Moment bounds combined with the CLT imply convergence of the normalized covariance, lim_n →∞ E [ z_n z_n^T ] = Σ_θ, where Σ_θ is the asymptotic covariance appearing in the CLT. (iv) An example is provided where the Markov chain Φ is geometrically ergodic but it does not satisfy (DV3). While the algorithm is convergent, the second moment is unbounded.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset