Lazy stochastic principal component analysis

09/21/2017
by   Michael Wojnowicz, et al.
0

Stochastic principal component analysis (SPCA) has become a popular dimensionality reduction strategy for large, high-dimensional datasets. We derive a simplified algorithm, called Lazy SPCA, which has reduced computational complexity and is better suited for large-scale distributed computation. We prove that SPCA and Lazy SPCA find the same approximations to the principal subspace, and that the pairwise distances between samples in the lower-dimensional space is invariant to whether SPCA is executed lazily or not. Empirical studies find downstream predictive performance to be identical for both methods, and superior to random projections, across a range of predictive models (linear regression, logistic lasso, and random forests). In our largest experiment with 4.6 million samples, Lazy SPCA reduced 43.7 hours of computation to 9.9 hours. Overall, Lazy SPCA relies exclusively on matrix multiplications, besides an operation on a small square matrix whose size depends only on the target dimensionality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/03/2019

Projecting "better than randomly": How to reduce the dimensionality of very large datasets in a way that outperforms random projections

For very large datasets, random projections (RP) have become the tool of...
research
01/07/2019

Stochastic Approximation Algorithms for Principal Component Analysis

Principal Component Analysis is a novel way of of dimensionality reducti...
research
11/10/2016

Policy Search with High-Dimensional Context Variables

Direct contextual policy search methods learn to improve policy paramete...
research
12/15/2017

Sparse principal component analysis via random projections

We introduce a new method for sparse principal component analysis, based...
research
12/25/2017

Efficient Algorithms for t-distributed Stochastic Neighborhood Embedding

t-distributed Stochastic Neighborhood Embedding (t-SNE) is a method for ...
research
02/12/2020

Mycorrhiza: genotype assignment using phylogenetic networks

Motivation The genotype assignment problem consists of predicting, from ...
research
09/26/2022

On Projections to Linear Subspaces

The merit of projecting data onto linear subspaces is well known from, e...

Please sign up or login with your details

Forgot password? Click here to reset