A logic-based resampling with matching approach to multiple imputation of missing data
Researchers often use model-based multiple imputation to handle missing at random data to minimize bias while making the best use of all available data. However, there are contexts where it is very difficult to fit a model due to constraints amongst variables, and using a generic regression imputation model may result in implausible values. We explore the advantages of employing a logic-based resampling with matching (RWM) approach for multiple imputation. This approach is similar to random hot deck imputation, and allows for more plausible imputations than model-based approaches. We illustrate a RWM approach for multiply imputing missing pain, activity frequency, and sport data using The Childhood Health, Activity, and Motor Performance School Study Denmark (CHAMPS-DK). We match records with missing data to several observed records, generate probabilities for matched records using observed data, and sample from these records based on the probability of each occurring. Because imputed values are generated randomly, multiple complete datasets can be created. They are then analyzed and averaged in the same way as model-based multiple imputation. This approach can be extended to other datasets as an alternative to model-based approaches, particularly where there are time-dependent ordered categorical variables or other constraints between variables.
READ FULL TEXT