HeMPPCAT: Mixtures of Probabilistic Principal Component Analysers for Data with Heteroscedastic Noise
Mixtures of probabilistic principal component analysis (MPPCA) is a well-known mixture model extension of principal component analysis (PCA). Similar to PCA, MPPCA assumes the data samples in each mixture contain homoscedastic noise. However, datasets with heterogeneous noise across samples are becoming increasingly common, as larger datasets are generated by collecting samples from several sources with varying noise profiles. The performance of MPPCA is suboptimal for data with heteroscedastic noise across samples. This paper proposes a heteroscedastic mixtures of probabilistic PCA technique (HeMPPCAT) that uses a generalized expectation-maximization (GEM) algorithm to jointly estimate the unknown underlying factors, means, and noise variances under a heteroscedastic noise setting. Simulation results illustrate the improved factor estimates and clustering accuracies of HeMPPCAT compared to MPPCA.
READ FULL TEXT