Bayesian Cox Regression for Population-scale Inference in Electronic Health Records

by   Alexander W. Jung, et al.

The Cox model is an indispensable tool for time-to-event analysis, particularly in biomedical research. However, medicine is undergoing a profound transformation, generating data at an unprecedented scale, which opens new frontiers to study and understand diseases. With the wealth of data collected, new challenges for statistical inference arise, as datasets are often high dimensional, exhibit an increasing number of measurements at irregularly spaced time points, and are simply too large to fit in memory. Many current implementations for time-to-event analysis are ill-suited for these problems as inference is computationally demanding and requires access to the full data at once. Here we propose a Bayesian version for the counting process representation of Cox's partial likelihood for efficient inference on large-scale datasets with millions of data points and thousands of time-dependent covariates. Through the combination of stochastic variational inference and a reweighting of the log-likelihood, we obtain an approximation for the posterior distribution that factorizes over subsamples of the data, enabling the analysis in big data settings. Crucially, the method produces viable uncertainty estimates for large-scale and high-dimensional datasets. We show the utility of our method through a simulation study and an application to myocardial infarction in the UK Biobank.


page 1

page 2

page 3

page 4


Efficient expectation propagation for posterior approximation in high-dimensional probit models

Bayesian binary regression is a prosperous area of research due to the c...

C-mix: a high dimensional mixture model for censored durations, with applications to genetic data

We introduce a mixture model for censored durations (C-mix), and develop...

Fast Variational Inference for Bayesian Factor Analysis in Single and Multi-Study Settings

Factors models are routinely used to analyze high-dimensional data in bo...

Large Scale Tensor Regression using Kernels and Variational Inference

We outline an inherent weakness of tensor factorization models when late...

LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations

Due to the ease of modern data collection, applied statisticians often h...

Kernel meets sieve: transformed hazards models with sparse longitudinal covariates

We study the transformed hazards model with time-dependent covariates ob...

Statistical Inference for Streamed Longitudinal Data

Modern longitudinal data, for example from wearable devices, measures bi...

Please sign up or login with your details

Forgot password? Click here to reset