Optimal Diagonal Preconditioning: Theory and Practice

by   Zhaonan Qu, et al.

Preconditioning has been a staple technique in optimization and machine learning. It often reduces the condition number of the matrix it is applied to, thereby speeding up convergence of optimization algorithms. Although there are many popular preconditioning techniques in practice, most lack theoretical guarantees for reductions in condition number. In this paper, we study the problem of optimal diagonal preconditioning to achieve maximal reduction in the condition number of any full-rank matrix by scaling its rows or columns separately or simultaneously. We first reformulate the problem as a quasi-convex problem and provide a baseline bisection algorithm that is easy to implement in practice, where each iteration consists of an SDP feasibility problem. Then we propose a polynomial time potential reduction algorithm with O(log(1/ϵ)) iteration complexity, where each iteration consists of a Newton update based on the Nesterov-Todd direction. Our algorithm is based on a formulation of the problem which is a generalized version of the Von Neumann optimal growth problem. Next, we specialize to one-sided optimal diagonal preconditioning problems, and demonstrate that they can be formulated as standard dual SDP problems, to which we apply efficient customized solvers and study the empirical performance of our optimal diagonal preconditioners. Our extensive experiments on large matrices demonstrate the practical appeal of optimal diagonal preconditioners at reducing condition numbers compared to heuristics-based preconditioners.


page 1

page 2

page 3

page 4


Diagonal Preconditioning: Theory and Algorithms

Diagonal preconditioning has been a staple technique in optimization and...

On the rank of Z_2-matrices with free entries on the diagonal

For an n × n matrix M with entries in ℤ_2 denote by R(M) the minimal ran...

Fast computation of optimal damping parameters for linear vibrational systems

We formulate the quadratic eigenvalue problem underlying the mathematica...

CompAdaGrad: A Compressed, Complementary, Computationally-Efficient Adaptive Gradient Method

The adaptive gradient online learning method known as AdaGrad has seen w...

Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses

In this paper, we study large-scale convex optimization algorithms based...

Nearly optimal scaling in the SR decomposition

In this paper we analyze the nearly optimal block diagonal scalings of t...

Interior-point methods for unconstrained geometric programming and scaling problems

We provide a condition-based analysis of two interior-point methods for ...

Please sign up or login with your details

Forgot password? Click here to reset