A statistical machine learning approach for benchmarking in the presence of complex contextual factors and peer groups
The ability to compare between individuals or organisations fairly is important for the development of robust and meaningful quantitative benchmarks. To make fair comparisons, contextual factors must be taken into account, and comparisons should only be made between similar organisations such as peer groups. Previous benchmarking methods have used linear regression to adjust for contextual factors, however linear regression is known to be sub-optimal when nonlinear relationships exist between the comparative measure and covariates. In this paper we propose a random forest model for benchmarking that can adjust for these potential nonlinear relationships, and validate the approach in a case-study of high noise data. We provide new visualisations and numerical summaries of the fitted models and comparative measures to facilitate interpretation by both analysts and non-technical audiences. Comparisons can be made across the cohort or within peer groups, and bootstrapping provides a means of estimating uncertainty in both adjusted measures and rankings. We conclude that random forest models can facilitate fair comparisons between organisations for quantitative measures including in cases on complex contextual factor relationships, and that the models and outputs are readily interpreted by stakeholders.
READ FULL TEXT