Optimizing Spectral Sums using Randomized Chebyshev Expansions
The trace of matrix functions, often called spectral sums, e.g., rank, log-determinant and nuclear norm, appear in many machine learning tasks. However, optimizing or computing such (parameterized) spectral sums typically involves the matrix decomposition at the cost cubic in the matrix dimension, which is expensive for large-scale applications. Several recent works were proposed to approximate large-scale spectral sums utilizing polynomial function approximations and stochastic trace estimators. However, all prior works on this line have studied biased estimators, and their direct adaptions to an optimization task under stochastic gradient descent (SGD) frameworks often do not work as accumulated biased errors prevent stable convergence to the optimum. To address the issue, we propose the provable optimal unbiased estimator by randomizing Chebyshev polynomial degrees. We further introduce two additional techniques for accelerating SGD, where key ideas are on sharing randomness among many estimations during the iterative procedure. Finally, we showcase two applications of the proposed SGD schemes: matrix completion and learning Gaussian process, under the real-world datasets.
READ FULL TEXT