Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning
The stochastic approximation (SA) algorithm is a widely used probabilistic method for finding a solution to an equation of the form ๐(ฮธ) = 0 where ๐ : โ^d โโ^d, when only noisy measurements of ๐(ยท) are available. In the literature to date, one can make a distinction between "synchronous" updating, whereby the entire vector of the current guess ฮธ_t is updated at each time, and "asynchronous" updating, whereby ony one component of ฮธ_t is updated. In convex and nonconvex optimization, there is also the notion of "batch" updating, whereby some but not all components of ฮธ_t are updated at each time t. In addition, there is also a distinction between using a "local" clock versus a "global" clock. In the literature to date, convergence proofs when a local clock is used make the assumption that the measurement noise is an i.i.dsequence, an assumption that does not hold in Reinforcement Learning (RL). In this note, we provide a general theory of convergence for batch asymchronous stochastic approximation (BASA), that works whether the updates use a local clock or a global clock, for the case where the measurement noises form a martingale difference sequence. This is the most general result to date and encompasses all others.
READ FULL TEXT