Interpretable Models Capable of Handling Systematic Missingness in Imbalanced Classes and Heterogeneous Datasets

by   Sreejita Ghosh, et al.

Application of interpretable machine learning techniques on medical datasets facilitate early and fast diagnoses, along with getting deeper insight into the data. Furthermore, the transparency of these models increase trust among application domain experts. Medical datasets face common issues such as heterogeneous measurements, imbalanced classes with limited sample size, and missing data, which hinder the straightforward application of machine learning techniques. In this paper we present a family of prototype-based (PB) interpretable models which are capable of handling these issues. The models introduced in this contribution show comparable or superior performance to alternative techniques applicable in such situations. However, unlike ensemble based models, which have to compromise on easy interpretation, the PB models here do not. Moreover we propose a strategy of harnessing the power of ensembles while maintaining the intrinsic interpretability of the PB models, by averaging the model parameter manifolds. All the models were evaluated on a synthetic (publicly available dataset) in addition to detailed analyses of two real-world medical datasets (one publicly available). Results indicated that the models and strategies we introduced addressed the challenges of real-world medical data, while remaining computationally inexpensive and transparent, as well as similar or superior in performance compared to their alternatives.


page 14

page 20

page 24

page 25

page 33

page 34

page 36

page 39


Delving into Deep Imbalanced Regression

Real-world data often exhibit imbalanced distributions, where certain ta...

Medical Knowledge-Guided Deep Learning for Imbalanced Medical Image Classification

Deep learning models have gained remarkable performance on a variety of ...

IA-GCN: Interpretable Attention based Graph Convolutional Network for Disease prediction

Interpretability in Graph Convolutional Networks (GCNs) has been explore...

Machine Learning Performance Analysis to Predict Stroke Based on Imbalanced Medical Dataset

Cerebral stroke, the second most substantial cause of death universally,...

Electron Energy Regression in the CMS High-Granularity Calorimeter Prototype

We present a new publicly available dataset that contains simulated data...

Effective Learning of Probabilistic Models for Clinical Predictions from Longitudinal Data

With the expeditious advancement of information technologies, health-rel...

Adaptive Decision Forest: An Incremental Machine Learning Framework

In this study, we present an incremental machine learning framework call...

Please sign up or login with your details

Forgot password? Click here to reset