Methodological variations in lagged regression for detecting physiologic drug effects in EHR data

01/26/2018
by   Matthew E. Levine, et al.
0

We studied how lagged linear regression can be used to detect the physiologic effects of drugs from data in the electronic health record (EHR). We systematically examined the effect of methodological variations ((i) time series construction, (ii) temporal parameterization, (iii) intra-subject normalization, (iv) differencing (lagged rates of change achieved by taking differences between consecutive measurements), (v) explanatory variables, and (vi) regression models) on performance of lagged linear methods in this context. We generated two gold standards (one knowledge-base derived, one expert-curated) for expected pairwise relationships between 7 drugs and 4 labs, and evaluated how the 64 unique combinations of methodological perturbations reproduce gold standards. Our 28 cohorts included patients in Columbia University Medical Center/NewYork-Presbyterian Hospital clinical database. The most accurate methods achieved AUROC of 0.794 for knowledge-base derived gold standard (95 CI [0.629, 0.781]). We observed a 0.633 mean AUROC (95 expert-curated gold standard) across all methods that re-parameterize time according to sequence and use either a joint autoregressive model with differencing or an independent lag model without differencing. The complement of this set of methods achieved a mean AUROC close to 0.5, indicating the importance of these choices. We conclude that time- series analysis of EHR data will likely rely on some of the beneficial pre-processing and modeling methodologies identified, and will certainly benefit from continued careful analysis of methodological perturbations. This study found that methodological variations, such as pre-processing and representations, significantly affect results, exposing the importance of evaluating these components when comparing machine-learning methods.

READ FULL TEXT
research
10/26/2020

Combining statistical learning with a knowledge-based approach -- A case study in intensive care monitoring

The paper describes a case study in combining different methods for acqu...
research
04/29/2022

An Extensive Data Processing Pipeline for MIMIC-IV

An increasing amount of research is being devoted to applying machine le...
research
03/27/2017

Sparse Multi-Output Gaussian Processes for Medical Time Series Prediction

In real-time monitoring of hospital patients, high-quality inference of ...
research
01/02/2021

Optimal Segmented Linear Regression for Financial Time Series Segmentation

Given a financial time series data, one of the most fundamental and inte...
research
11/10/2019

Unsupervised Annotation of Phenotypic Abnormalities via Semantic Latent Representations on Electronic Health Records

The extraction of phenotype information which is naturally contained in ...
research
08/02/2016

Clinical Tagging with Joint Probabilistic Models

We describe a method for parameter estimation in bipartite probabilistic...
research
02/23/2021

Model-Attentive Ensemble Learning for Sequence Modeling

Medical time-series datasets have unique characteristics that make predi...

Please sign up or login with your details

Forgot password? Click here to reset