A Nonparametric Bayesian Method for Clustering of High-Dimensional Mixed Dataset

08/13/2018
by   Chetkar Jha, et al.
0

Motivation: Advances in next-generation sequencing (NGS) methods have enabled researchers and agencies to collect a wide variety of sequencing data across multiple platforms. The motivation behind such an exercise is to analyze these datasets jointly, in order to gain insights into disease prognosis, treatment, and cure. Clustering of such datasets, can provide much needed insight into biological associations. However, the differing scale, and the heterogeneity of the mixed dataset is hurdle for such analyses. Results: The paper proposes a nonparameteric Bayesian approach called Gen-VariScan for biclustering of high-dimensional mixed data. Generalized Linear Models (GLM), and latent variable approaches are utilized to integrate mixed dataset. Sparsity inducing property of Poisson Dirichlet Process (PDP) is used to identify a lower dimensional structure of mixed covariates. We apply our method to Glioblastoma Multiforme (GBM) cancer dataset. We show that cluster detection is aposteriori consistent, as number of covariates and subject grows. As a byproduct, we derive a working value approach to perform beta regression.

READ FULL TEXT

page 6

page 36

research
11/03/2017

Bayesian Nonparametric Mixed Effects Models in Microbiome Data Analysis

Detecting associations between microbial composition and sample characte...
research
02/07/2019

Estimation of variance components, heritability and the ridge penalty in high-dimensional generalized linear models

For high-dimensional linear regression models, we review and compare sev...
research
12/19/2017

High dimensional Single Index Bayesian Modeling of the Brain Atrophy over time

We study the effects of gender, APOE genes, age, genetic variation and A...
research
02/03/2017

Sharp Convergence Rates for Forward Regression in High-Dimensional Sparse Linear Models

Forward regression is a statistical model selection and estimation proce...
research
10/21/2015

Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions

We propose a novel method for multiple clustering that assumes a co-clus...
research
06/30/2020

Hierarchical Qualitative Clustering – clustering mixed datasets with critical qualitative information

Clustering can be used to extract insights from data or to verify some o...
research
02/16/2021

A Bayesian Framework for Generation of Fully Synthetic Mixed Datasets

Much of the micro data used for epidemiological studies contain sensitiv...

Please sign up or login with your details

Forgot password? Click here to reset