Modeling rates of disease with missing categorical data

by   Rob Trangucci, et al.

Covariates like age, sex, and race/ethnicity provide invaluable insight to public health authorities trying to interpret surveillance data collected during a public health emergency such as the COVID-19 pandemic. However, the utility of such data is limited when many cases are missing key covariates. This problem is most concerning when this missingness likely depends on the values of missing covariates, i.e. they are not missing at random (NMAR). We propose a Bayesian parametric model that leverages joint information on spatial variation in the disease and covariate missingness processes and can accommodate both MAR and NMAR missingness. We show that the model is locally identifiable when the spatial distribution of the population covariates is known and observed cases can be associated with a spatial unit of observation. We also use a simulation study to investigate the model's finite-sample performance. We compare our model's performance on NMAR data against complete-case analysis and multiple imputation (MI), both of which are commonly used by public health researchers when confronted with missing categorical covariates. Finally, we model spatial variation in cumulative COVID-19 incidence in Wayne County, Michigan using data from the Michigan Department and Health and Human Services. The analysis suggests that population relative risk estimates by race during the early part of the COVID-19 pandemic in Michigan were understated for non-white residents compared to white residents when cases missing race were dropped or had these values imputed using MI.


page 1

page 2

page 3

page 4


Missing data analysis and imputation via latent Gaussian Markov random fields

In this paper we recast the problem of missing values in the covariates ...

RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning

Anonymized electronic medical records are an increasingly popular source...

Geographic and Racial Disparities in the Incidence of Low Birthweight in Pennsylvania

Babies born with low and very low birthweights – i.e., birthweights belo...

AICov: An Integrative Deep Learning Framework for COVID-19 Forecasting with Population Covariates

The COVID-19 pandemic has profound global consequences on health, econom...

A latent spatial factor approach for synthesizing opioid associated deaths and treatment admissions in Ohio counties

Background: Opioid misuse is a major public health issue in the United S...

Matching with multiple criteria and its application to health disparities research

Matching is a popular nonparametric covariate adjustment strategy in emp...

Optimal Control Measures Based on the Reconstruction of the COVID-19 Interlocalilty Transmission Network in Lebanon

In this paper, we study the evolution of COVID-19 in Lebanon using the d...

Please sign up or login with your details

Forgot password? Click here to reset