Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction

06/28/2021
by   Dominik Stöger, et al.
0

Recently there has been significant theoretical progress on understanding the convergence and generalization of gradient-based methods on nonconvex losses with overparameterized models. Nevertheless, many aspects of optimization and generalization and in particular the critical role of small random initialization are not fully understood. In this paper, we take a step towards demystifying this role by proving that small random initialization followed by a few iterations of gradient descent behaves akin to popular spectral methods. We also show that this implicit spectral bias from small random initialization, which is provably more prominent for overparameterized models, also puts the gradient descent iterations on a particular trajectory towards solutions that are not only globally optimal but also generalize well. Concretely, we focus on the problem of reconstructing a low-rank matrix from a few measurements via a natural nonconvex formulation. In this setting, we show that the trajectory of the gradient descent iterations from small random initialization can be approximately decomposed into three phases: (I) a spectral or alignment phase where we show that that the iterates have an implicit spectral bias akin to spectral initialization allowing us to show that at the end of this phase the column space of the iterates and the underlying low-rank matrix are sufficiently aligned, (II) a saddle avoidance/refinement phase where we show that the trajectory of the gradient iterates moves away from certain degenerate saddle points, and (III) a local refinement phase where we show that after avoiding the saddles the iterates converge quickly to the underlying low-rank matrix. Underlying our analysis are insights for the analysis of overparameterized nonconvex optimization schemes that may have implications for computational problems beyond low-rank reconstruction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2023

Implicit Balancing and Regularization: Generalization and Convergence Guarantees for Overparameterized Asymmetric Matrix Sensing

Recently, there has been significant progress in understanding the conve...
research
09/25/2018

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Substantial progress has been made recently on developing provably accur...
research
12/19/2022

Rank-1 Matrix Completion with Gradient Descent and Small Random Initialization

The nonconvex formulation of matrix completion problem has received sign...
research
06/12/2019

Generalization Guarantees for Neural Networks via Harnessing the Low-rank Structure of the Jacobian

Modern neural network architectures often generalize well despite contai...
research
01/13/2021

Beyond Procrustes: Balancing-Free Gradient Descent for Asymmetric Low-Rank Matrix Sensing

Low-rank matrix estimation plays a central role in various applications ...
research
03/21/2018

Gradient Descent with Random Initialization: Fast Global Convergence for Nonconvex Phase Retrieval

This paper considers the problem of solving systems of quadratic equatio...
research
01/27/2023

Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing

It is believed that Gradient Descent (GD) induces an implicit bias towar...

Please sign up or login with your details

Forgot password? Click here to reset