Regularization and Hierarchical Prior Distributions for Adjustment with Health Care Claims Data: Rethinking Comorbidity Scores
Health care claims data refer to information generated from interactions within health systems. They have been used in health services research for decades to assess effectiveness of interventions, determine the quality of medical care, predict disease prognosis, and monitor population health. While claims data are relatively cheap and ubiquitous, they are high-dimensional, sparse, and noisy, typically requiring dimension reduction. In health services research, the most common data reduction strategy involves use of a comorbidity index -- a single number summary reflecting overall patient health. We discuss Bayesian regularization strategies and a novel hierarchical prior distribution as better options for dimension reduction in claims data. The specifications are designed to work with a large number of codes while controlling variance by shrinking coefficients towards zero or towards a group-level mean. A comparison of drug-eluting to bare-metal coronary stents illustrates approaches. In our application, regularization and a hierarchical prior improved over comorbidity scores in terms of prediction and causal inference, as evidenced by out-of-sample fit and the ability to meet falsifiability endpoints.
READ FULL TEXT