Assessing the uncertainty in statistical evidence with the possibility of model misspecification using a non-parametric bootstrap

11/15/2019

∙

Empirical evidence, e.g. observed likelihood ratio, is an estimator of the difference of the divergences between two competing models (or, model sets) and the true generating mechanism. It is unclear how to use such empirical evidence in scientific practice. Scientists usually want to know "how often would I get this level of evidence". The answer to this question depends on the true generating mechanism along with the models under consideration. In many situations, having observed the data, we can approximate the true generating mechanism non-parametrically by assuming far less structure than the parametric models being compared. We use a resampling method based on the non-parametric estimate of the true generating mechanism to estimate a confidence interval for the empirical evidence that is robust to model misspecification. Such a confidence interval tells us how variable the empirical evidence would be if the experiment (or observational study) were to be replicated. In our simulations, variability in empirical evidence appears to be substantial and hence using empirical evidence without a measure of uncertainty is, in fact, treacherous in practice. We divide the decision space in six different categories: Strong and secure(SS), Strong but insecure(SI), Weak and secure(WS), Weak and insecure(WI), Misleading and insecure(MI) and Misleading and secure (MS) based on the confidence interval. We illustrate the use of these categories for model selection in the context of regression. We show that instead of the three categories: Strong, Weak and Misleading, as suggested by Royall (1997), the six categories decision process leads to smaller errors in model selection and hence in scientific conclusions.

READ FULL TEXT