Microbiome subcommunity learning with logistic-tree normal latent Dirichlet allocation

by   Patrick LeBlanc, et al.

Mixed-membership (MM) models such as Latent Dirichlet Allocation (LDA) have been applied to microbiome compositional data to identify latent subcommunities of microbial species. However, microbiome compositional data, especially those collected from the gut, typically display substantial cross-sample heterogeneities in the subcommunity composition which current MM methods do not account for. To address this limitation, we incorporate the logistic-tree normal (LTN) model – using the phylogenetic tree structure – into the LDA model to form a new MM model. This model allows variation in the composition of each subcommunity around some “centroid” composition. Incorporation of auxiliary Pólya-Gamma variables enables a computationally efficient collapsed blocked Gibbs sampler to carry out Bayesian inference under this model. We compare the new model and LDA and show that in the presence of large cross-sample heterogeneity, under the LDA model the resulting inference can be extremely sensitive to the specification of the total number of subcommunities as it does not account for cross-sample heterogeneity. As such, the popular strategy in other applications of MM models of overspecifying the number of subcommunities – and hoping that some meaningful subcommunities will emerge among artificial ones – can lead to highly misleading conclusions in the microbiome context. In contrast, by accounting for such heterogeneity, our MM model restores the robustness of the inference in the specification of the number of subcommunities and again allows meaningful subcommunities to be identified under this strategy.


Discriminative Topic Modeling with Logistic LDA

Despite many years of research into latent Dirichlet allocation (LDA), a...

Hyperspectral Unmixing with Endmember Variability using Partial Membership Latent Dirichlet Allocation

The application of Partial Membership Latent Dirichlet Allocation(PM-LDA...

Logistic-tree normal model for microbiome compositions

We introduce a probabilistic model, called the "logistic-tree normal" (L...

Blocking Collapsed Gibbs Sampler for Latent Dirichlet Allocation Models

The latent Dirichlet allocation (LDA) model is a widely-used latent vari...

A new LDA formulation with covariates

The Latent Dirichlet Allocation (LDA) model is a popular method for crea...

The Exact Asymptotic Form of Bayesian Generalization Error in Latent Dirichlet Allocation

Latent Dirichlet allocation (LDA) obtains essential information from dat...

A flexible Bayesian tool for CoDa mixed models: logistic-normal distribution with Dirichlet covariance

Compositional Data Analysis (CoDa) has gained popularity in recent years...

Please sign up or login with your details

Forgot password? Click here to reset