Subexponential-Time Algorithms for Sparse PCA

by   Yunzi Ding, et al.

We study the computational cost of recovering a unit-norm sparse principal component x ∈R^n planted in a random matrix, in either the Wigner or Wishart spiked model (observing either W + λ xx^ with W drawn from the Gaussian orthogonal ensemble, or N independent samples from N(0, I_n + β xx^), respectively). Prior work has shown that when the signal-to-noise ratio (λ or β√(N/n), respectively) is a small constant and the fraction of nonzero entries in the planted vector is x_0 / n = ρ, it is possible to recover x in polynomial time if ρ≲ 1/√(n). While it is possible to recover x in exponential time under the weaker condition ρ≪ 1, it is believed that polynomial-time recovery is impossible unless ρ≲ 1/√(n). We investigate the precise amount of time required for recovery in the "possible but hard" regime 1/√(n)≪ρ≪ 1 by exploring the power of subexponential-time algorithms, i.e., algorithms running in time (n^δ) for some constant δ∈ (0,1). For any 1/√(n)≪ρ≪ 1, we give a recovery algorithm with runtime roughly (ρ^2 n), demonstrating a smooth tradeoff between sparsity and runtime. Our family of algorithms interpolates smoothly between two existing algorithms: the polynomial-time diagonal thresholding algorithm and the (ρ n)-time exhaustive search algorithm. Furthermore, by analyzing the low-degree likelihood ratio, we give rigorous evidence suggesting that the tradeoff achieved by our algorithms is optimal.


page 1

page 2

page 3

page 4


The Complexity of Sparse Tensor PCA

We study the problem of sparse tensor principal component analysis: give...

Sparse PCA Beyond Covariance Thresholding

In the Wishart model for sparse PCA we are given n samples Y_1,…, Y_n dr...

Optimal Spectral Recovery of a Planted Vector in a Subspace

Recovering a planted vector v in an n-dimensional random subspace of ℝ^N...

Free Energy Wells and Overlap Gap Property in Sparse PCA

We study a variant of the sparse PCA (principal component analysis) prob...

Higher degree sum-of-squares relaxations robust against oblivious outliers

We consider estimation models of the form Y=X^*+N, where X^* is some m-d...

Do semidefinite relaxations solve sparse PCA up to the information limit?

Estimating the leading principal components of data, assuming they are s...

A greedy anytime algorithm for sparse PCA

The taxing computational effort that is involved in solving some high-di...

Please sign up or login with your details

Forgot password? Click here to reset