AutoScore-Imbalance: An interpretable machine learning tool for development of clinical scores with rare events data

by   Han Yuan, et al.

Background: Medical decision-making impacts both individual and public health. Clinical scores are commonly used among a wide variety of decision-making models for determining the degree of disease deterioration at the bedside. AutoScore was proposed as a useful clinical score generator based on machine learning and a generalized linear model. Its current framework, however, still leaves room for improvement when addressing unbalanced data of rare events. Methods: Using machine intelligence approaches, we developed AutoScore-Imbalance, which comprises three components: training dataset optimization, sample weight optimization, and adjusted AutoScore. All scoring models were evaluated on the basis of their area under the curve (AUC) in the receiver operating characteristic analysis and balanced accuracy (i.e., mean value of sensitivity and specificity). By utilizing a publicly accessible dataset from Beth Israel Deaconess Medical Center, we assessed the proposed model and baseline approaches in the prediction of inpatient mortality. Results: AutoScore-Imbalance outperformed baselines in terms of AUC and balanced accuracy. The nine-variable AutoScore-Imbalance sub-model achieved the highest AUC of 0.786 (0.732-0.839) while the eleven-variable original AutoScore obtained an AUC of 0.723 (0.663-0.783), and the logistic regression with 21 variables obtained an AUC of 0.743 (0.685-0.800). The AutoScore-Imbalance sub-model (using down-sampling algorithm) yielded an AUC of 0. 0.771 (0.718-0.823) with only five variables, demonstrating a good balance between performance and variable sparsity. Conclusions: The AutoScore-Imbalance tool has the potential to be applied to highly unbalanced datasets to gain further insight into rare medical events and to facilitate real-world clinical decision-making.


Density-Aware Personalized Training for Risk Prediction in Imbalanced Medical Data

Medical events of interest, such as mortality, often happen at a low rat...

Undersampling and Cumulative Class Re-decision Methods to Improve Detection of Agitation in People with Dementia

Agitation is one of the most prevalent symptoms in people with dementia ...

AutoScore-Ordinal: An interpretable machine learning framework for generating scoring models for ordinal outcomes

Background: Risk prediction models are useful tools in clinical decision...

On the Importance of Diversity in Re-Sampling for Imbalanced Data and Rare Events in Mortality Risk Models

Surgical risk increases significantly when patients present with comorbi...

Please sign up or login with your details

Forgot password? Click here to reset