Empirical Bayes PCA in high dimensions

12/21/2020
by   Xinyi Zhong, et al.
0

When the dimension of data is comparable to or larger than the number of available data samples, Principal Components Analysis (PCA) is known to exhibit problematic phenomena of high-dimensional noise. In this work, we propose an Empirical Bayes PCA method that reduces this noise by estimating a structural prior for the joint distributions of the principal components. This EB-PCA method is based upon the classical Kiefer-Wolfowitz nonparametric MLE for empirical Bayes estimation, distributional results derived from random matrix theory for the sample PCs, and iterative refinement using an Approximate Message Passing (AMP) algorithm. In theoretical "spiked" models, EB-PCA achieves Bayes-optimal estimation accuracy in the same settings as the oracle Bayes AMP procedure that knows the true priors. Empirically, EB-PCA can substantially improve over PCA when there is strong prior structure, both in simulation and on several quantitative benchmarks constructed using data from the 1000 Genomes Project and the International HapMap Project. A final illustration is presented for an analysis of gene expression data obtained by single-cell RNA-seq.

READ FULL TEXT
research
06/04/2021

PCA Initialization for Approximate Message Passing in Rotationally Invariant Models

We study the problem of estimating a rank-1 signal in the presence of ro...
research
05/10/2020

A nonparametric empirical Bayes approach to covariance matrix estimation

We propose an empirical Bayes method to estimate high-dimensional covari...
research
10/03/2022

Bayes-optimal limits in structured PCA, and how to reach them

We study the paradigmatic spiked matrix model of principal components an...
research
12/05/2020

Selecting the number of components in PCA via random signflips

Dimensionality reduction via PCA and factor analysis is an important too...
research
08/27/2020

Approximate Message Passing algorithms for rotationally invariant matrices

Approximate Message Passing (AMP) algorithms have seen widespread use ac...
research
10/28/2016

Correlated-PCA: Principal Components' Analysis when Data and Noise are Correlated

Given a matrix of observed data, Principal Components Analysis (PCA) com...
research
03/20/2018

Data Distillery: Effective Dimension Estimation via Penalized Probabilistic PCA

The paper tackles the unsupervised estimation of the effective dimension...

Please sign up or login with your details

Forgot password? Click here to reset