Evaluating Model Performance in Medical Datasets Over Time

by   Helen Zhou, et al.

Machine learning (ML) models deployed in healthcare systems must face data drawn from continually evolving environments. However, researchers proposing such models typically evaluate them in a time-agnostic manner, splitting datasets according to patients sampled randomly throughout the entire study time period. This work proposes the Evaluation on Medical Datasets Over Time (EMDOT) framework, which evaluates the performance of a model class across time. Inspired by the concept of backtesting, EMDOT simulates possible training procedures that practitioners might have been able to execute at each point in time and evaluates the resulting models on all future time points. Evaluating both linear and more complex models on six distinct medical data sources (tabular and imaging), we show how depending on the dataset, using all historical data may be ideal in many cases, whereas using a window of the most recent data could be advantageous in others. In datasets where models suffer from sudden degradations in performance, we investigate plausible explanations for these shocks. We release the EMDOT package to help facilitate further works in deployment-oriented evaluation over time.


page 24

page 25

page 29

page 35

page 36

page 37

page 38


Model Evaluation in Medical Datasets Over Time

Machine learning models deployed in healthcare systems face data drawn f...

DEPLOYR: A technical framework for deploying custom real-time machine learning models into the electronic medical record

Machine learning (ML) applications in healthcare are extensively researc...

A Visual Analytics System for Multi-model Comparison on Clinical Data Predictions

There is a growing trend of applying machine learning methods to medical...

Mitigating ML Model Decay in Continuous Integration with Data Drift Detection: An Empirical Study

Background: Machine Learning (ML) methods are being increasingly used fo...

Data vs classifiers, who wins?

The classification experiments covered by machine learning (ML) are comp...

Critical Checkpoints for Evaluating Defence Models Against Adversarial Attack and Robustness

From past couple of years there is a cycle of researchers proposing a de...

Estimating Model Performance on External Samples from Their Limited Statistical Characteristics

Methods that address data shifts usually assume full access to multiple ...

Please sign up or login with your details

Forgot password? Click here to reset