Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning

09/08/2021
โˆ™
by   Rajeeva L. Karandikar, et al.
โˆ™
0
โˆ™

The stochastic approximation (SA) algorithm is a widely used probabilistic method for finding a solution to an equation of the form ๐Ÿ(ฮธ) = 0 where ๐Ÿ : โ„^d โ†’โ„^d, when only noisy measurements of ๐Ÿ(ยท) are available. In the literature to date, one can make a distinction between "synchronous" updating, whereby the entire vector of the current guess ฮธ_t is updated at each time, and "asynchronous" updating, whereby ony one component of ฮธ_t is updated. In convex and nonconvex optimization, there is also the notion of "batch" updating, whereby some but not all components of ฮธ_t are updated at each time t. In addition, there is also a distinction between using a "local" clock versus a "global" clock. In the literature to date, convergence proofs when a local clock is used make the assumption that the measurement noise is an i.i.dsequence, an assumption that does not hold in Reinforcement Learning (RL). In this note, we provide a general theory of convergence for batch asymchronous stochastic approximation (BASA), that works whether the updates use a local clock or a global clock, for the case where the measurement noises form a martingale difference sequence. This is the most general result to date and encompasses all others.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset