Flexible Regularized Estimation in High-Dimensional Mixed Membership Models
Mixed membership models are an extension of finite mixture models, where each observation can belong partially to more than one mixture component. We introduce a probabilistic framework for mixed membership models of high-dimensional continuous data with a focus on scalability and interpretability. We derive a novel probabilistic representation of mixed membership based on direct convex combinations of dependent multivariate Gaussian random vectors. In this setting, scalability is ensured through approximations of a tensor covariance structure through multivariate eigen-approximations with adaptive regularization imposed through shrinkage priors. Conditional posterior consistency is established on an unconstrained model, allowing us to facilitate a simple posterior sampling scheme while keeping many of the desired theoretical properties of our model. Our work is motivated by two biomedical case studies: a case study on functional brain imaging of children with autism spectrum disorder (ASD) and a case study on gene expression data from breast cancer tissue. Through these applications, we highlight how the typical assumption made in cluster analysis, that each observation comes from one homogeneous subgroup, may often be restrictive in BioX applications, leading to unnatural interpretations of data features.
READ FULL TEXT