Irreproducibility; Nothing is More Predictable
The increasing ease of data capture and storage has led to a corresponding increase in the choice of data, the type of analysis performed on that data, and the complexity of the analysis performed. The main contribution of this paper is to show that the subjective choice of data and analysis methodology substantially impacts the identification of factors and outcomes of observational studies. This subjective variability of inference is at the heart of recent discussions around irreproducibility in scientific research. To demonstrate this subjective variability, data is taken from an existing study, where interest centres on understanding the factors associated with a young adult's propensity to fall into the category of `not in employment, education or training' (NEET). A fully probabilistic analysis is performed, set in a Bayesian framework and implemented using Reversible Jump Markov chain Monte Carlo (RJMCMC). The results show that different techniques lead to different inference but that models consisting of different factors often have the same predictive performance, whether the analysis is frequentist or Bayesian, making inference problematic. We demonstrate how the use of prior distributions in Bayesian techniques can be used to as a tool for assessing a factor's importance.
READ FULL TEXT