Methodology to Create Analysis-Naive Holdout Records as well as Train and Test Records for Machine Learning Analyses in Healthcare

05/09/2022
by   Michele Bennett, et al.
0

It is common for researchers to holdout data from a study pool to be used for external validation as well as for future research, and the same desire is true to those using machine learning modeling research. For this discussion, the purpose of the holdout sample it is preserve data for research studies that will be analysis-naive and randomly selected from the full dataset. Analysis-naive are records that are not used for testing or training machine learning (ML) models and records that do not participate in any aspect of the current machine learning study. The methodology suggested for creating holdouts is a modification of k-fold cross validation, which takes into account randomization and efficiently allows a three-way split (holdout, test and training) as part of the method without forcing. The paper also provides a working example using set of automated functions in Python and some scenarios for applicability in healthcare.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/03/2021

Machine Learning using Stata/Python

We present two related Stata modules, r_ml_stata and c_ml_stata, for fit...
research
08/22/2023

Toward Generalizable Machine Learning Models in Speech, Language, and Hearing Sciences: Power Analysis and Sample Size Estimation

This study's first purpose is to provide quantitative evidence that woul...
research
09/21/2019

DECoVaC: Design of Experiments with Controlled Variability Components

Reproducible research in Machine Learning has seen a salutary abundance ...
research
10/20/2020

Early Detection of Sepsis using Ensemblers

This paper describes a methodology to detect sepsis ahead of time by ana...
research
04/08/2014

A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data

Predicting an individual's risk of experiencing a future clinical outcom...
research
03/07/2023

Validation of a Hospital Digital Twin with Machine Learning

Recently there has been a surge of interest in developing Digital Twins ...
research
07/08/2022

No Time Like the Present: Effects of Language Change on Automated Comment Moderation

The spread of online hate has become a significant problem for newspaper...

Please sign up or login with your details

Forgot password? Click here to reset