Convergence to minima for the continuous version of Backtracking Gradient Descent

11/11/2019
by   Tuyen Trung Truong, et al.
0

The main result of this paper is: Theorem. Let f:R^k→R be a C^1 function, so that ∇ f is locally Lipschitz continuous. Assume moreover that f is C^2 near its generalised saddle points. Fix real numbers δ _0>0 and 0<α <1. Then there is a smooth function h:R^k→ (0,δ _0] so that the map H:R^k→R^k defined by H(x)=x-h(x)∇ f(x) has the following property: (i) For all x∈R^k, we have f(H(x)))-f(x)≤ -α h(x)||∇ f(x)||^2. (ii) For every x_0∈R^k, the sequence x_n+1=H(x_n) either satisfies lim _n→∞||x_n+1-x_n||=0 or lim _n→∞||x_n||=∞. Each cluster point of {x_n} is a critical point of f. If moreover f has at most countably many critical points, then {x_n} either converges to a critical point of f or lim _n→∞||x_n||=∞. (iii) There is a set E_1⊂R^k of Lebesgue measure 0 so that for all x_0∈R^k\E_1, the sequence x_n+1=H(x_n), if converges, cannot converge to a generalised saddle point. (iv) There is a set E_2⊂R^k of Lebesgue measure 0 so that for all x_0∈R^k\E_2, any cluster point of the sequence x_n+1=H(x_n) is not a saddle point, and more generally cannot be an isolated generalised saddle point. Some other results are proven.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset