Proxy expenditure weights for Consumer Price Index: Audit sampling inference for big data statistics
Purchase data from retail chains provide proxy measures of private household expenditure on items that are the most troublesome to collect in the traditional expenditure survey. Due to the sheer amount of proxy data, the bias due to coverage and selection errors completely dominates the variance. We develop tests for bias based on audit sampling, which makes use of available survey data that cannot be linked to the proxy data source at the individual level. However, audit sampling fails to yield a meaningful mean squared error estimate, because the sampling variance is too large compared to the bias of the big data estimate. We propose a novel accuracy measure that is applicable in such situations. This can provide a necessary part of the statistical argument for the uptake of big data source, in replacement of traditional survey sampling. An application to disaggregated food price index is used to demonstrate the proposed approach.
READ FULL TEXT