Joint integrative analysis of multiple data sources with correlated vector outcomes

by   Emily C. Hector, et al.

We propose a distributed quadratic inference function framework to jointly estimate regression parameters from multiple potentially heterogeneous data sources with correlated vector outcomes. The primary goal of this joint integrative analysis is to estimate covariate effects on all outcomes through a marginal regression model in a statistically and computationally efficient way. We develop a data integration procedure for statistical estimation and inference of regression parameters that is implemented in a fully distributed and parallelized computational scheme. To overcome computational and modeling challenges arising from the high-dimensional likelihood of the correlated vector outcomes, we propose to analyze each data source using Qu, Lindsay and Li (2000)'s quadratic inference functions, and then to jointly reestimate parameters from each data source by accounting for correlation between data sources using a combined meta-estimator in a similar spirit to Hansen (1982)'s generalised method of moments. We show both theoretically and numerically that the proposed method yields efficiency improvements and is computationally fast. We illustrate the proposed methodology with the joint integrative analysis of the association between smoking and metabolites in a large multi-cohort study and provide an R package for ease of implementation.


page 2

page 7

page 20


A Distributed and Integrated Method of Moments for High-Dimensional Correlated Data Analysis

This paper is motivated by a regression analysis of electroencephalograp...

Fused mean structure learning in data integration with dependence

Motivated by image-on-scalar regression with data aggregated across mult...

Doubly Distributed Supervised Learning and Inference with High-Dimensional Correlated Outcomes

This paper presents a unified framework for supervised learning and infe...

Functional Regression with Intensively Measured Longitudinal Outcomes: A New Lens through Data Partitioning

Modern longitudinal data from wearable devices consist of biological sig...

Distributed Online Big Data Classification Using Context Information

Distributed, online data mining systems have emerged as a result of appl...

Real-Time Regression Analysis of Streaming Clustered Data With Possible Abnormal Data Batches

This paper develops an incremental learning algorithm based on quadratic...

Distributed model building and recursive integration for big spatial data modeling

Motivated by the important need for computationally tractable statistica...

Please sign up or login with your details

Forgot password? Click here to reset