Bayes-optimal limits in structured PCA, and how to reach them

by   Jean Barbier, et al.

We study the paradigmatic spiked matrix model of principal components analysis, where the rank-one signal is corrupted by additive noise. While the noise is typically taken from a Wigner matrix with independent entries, here the potential acting on the eigenvalues has a quadratic plus a quartic component. The quartic term induces strong correlations between the matrix elements, which makes the setting relevant for applications but analytically challenging. Our work provides the first characterization of the Bayes-optimal limits for inference in this model with structured noise. If the signal prior is rotational-invariant, then we show that a spectral estimator is optimal. In contrast, for more general priors, the existing approximate message passing algorithm (AMP) falls short of achieving the information-theoretic limits, and we provide a justification for this sub-optimality. Finally, by generalizing the theory of Thouless-Anderson-Palmer equations, we cure the issue by proposing a novel AMP which matches the theoretical limits. Our information-theoretic analysis is based on the replica method, a powerful heuristic from statistical mechanics; instead, the novel AMP comes with a rigorous state evolution analysis tracking its performance in the high-dimensional limit. Even if we focus on a specific noise distribution, our methodology can be generalized to a wide class of trace ensembles, at the cost of more involved expressions.


page 1

page 2

page 3

page 4


Optimal Algorithms for the Inhomogeneous Spiked Wigner Model

In this paper, we study a spiked Wigner problem with an inhomogeneous no...

PCA Initialization for Approximate Message Passing in Rotationally Invariant Models

We study the problem of estimating a rank-1 signal in the presence of ro...

Matrix Inference in Growing Rank Regimes

The inference of a large symmetric signal-matrix 𝐒∈ℝ^N× N corrupted by a...

Empirical Bayes PCA in high dimensions

When the dimension of data is comparable to or larger than the number of...

Streaming Bayesian inference: theoretical limits and mini-batch approximate message-passing

In statistical learning for real-world large-scale data problems, one mu...

Optimal Combination of Linear and Spectral Estimators for Generalized Linear Models

We study the problem of recovering an unknown signal x given measurement...

The price of ignorance: how much does it cost to forget noise structure in low-rank matrix estimation?

We consider the problem of estimating a rank-1 signal corrupted by struc...

Please sign up or login with your details

Forgot password? Click here to reset