Logistic principal component analysis via non-convex singular value thresholding

by   Yipeng Song, et al.

Multivariate binary data is becoming abundant in current biological research. Logistic principal component analysis (PCA) is one of the commonly used tools to explore the relationships inside a multivariate binary data set by exploiting the underlying low rank structure. We re-expressed the logistic PCA model based on the latent variable interpretation of the generalized linear model on binary data. The multivariate binary data set is assumed to be the sign observation of an unobserved quantitative data set, on which a low rank structure is assumed to exist. However, the standard logistic PCA model (using exact low rank constraint) is prone to overfitting, which could lead to divergence of some estimated parameters towards infinity. We propose to fit a logistic PCA model through non-convex singular value thresholding to alleviate the overfitting issue. An efficient Majorization-Minimization algorithm is implemented to fit the model and a missing value based cross validation (CV) procedure is introduced for the model selection. Our experiments on realistic simulations of imbalanced binary data and low signal to noise ratio show that the CV error based model selection procedure is successful in selecting the proposed model. Furthermore, the selected model demonstrates superior performance in recovering the underlying low rank structure compared to models with convex nuclear norm penalty and exact low rank constraint. A binary copy number aberration data set is used to illustrate the proposed methodology in practice.


page 1

page 2

page 3

page 4


Generalized Simultaneous Component Analysis of Binary and Quantitative data

In the current era of systems biological research there is a need for th...

Sequential Logistic Principal Component Analysis (SLPCA): Dimensional Reduction in Streaming Multivariate Binary-State System

Sequential or online dimensional reduction is of interests due to the ex...

A Fast Factorization-based Approach to Robust PCA

Robust principal component analysis (RPCA) has been widely used for reco...

Dimensionality Reduction for Binary Data through the Projection of Natural Parameters

Principal component analysis (PCA) for binary data, known as logistic PC...

ALPCAH: Sample-wise Heteroscedastic PCA with Tail Singular Value Regularization

Principal component analysis (PCA) is a key tool in the field of data di...

Validation of nonlinear PCA

Linear principal component analysis (PCA) can be extended to a nonlinear...

Separating common (global and local) and distinct variation in multiple mixed types data sets

Multiple sets of measurements on the same objects obtained from differen...

Please sign up or login with your details

Forgot password? Click here to reset