Supervised variable selection in randomised controlled trials prior to exploration of treatment effect heterogeneity: an example from severe malaria
Exploration of treatment effect heterogeneity (TEH) is an increasingly important aspect of modern statistical analysis for stratified medicine in randomised controlled trials (RCTs) as we start to gather more information on trial participants and wish to maximise the opportunities for learning from data. However, the analyst should refrain from including a large number of variables in a treatment interaction discovery stage. Because doing so can significantly dilute the power to detect any true outcome-predictive interactions between treatments and covariates. Current guidance is limited and mainly relies on the use of unsupervised learning methods, such as hierarchical clustering or principal components analysis, to reduce the dimension of the variable space prior to interaction tests. In this article we show that outcome-driven dimension reduction, i.e. supervised variable selection, can maintain power without inflating the type-I error or false-positive rate. We provide theoretical and applied results to support our approach. The applied results are obtained from illustrating our framework on the dataset from an RCT in severe malaria. We also pay particular attention to the internal risk model approach for TEH discovery, which we show is a particular case of our method and we point to improvements over current implementation.
READ FULL TEXT