Scanner data in inflation measurement: from raw data to price indices
Scanner data offer new opportunities for CPI or HICP calculation. They can be obtained from a wide variety of retailers (supermarkets, home electronics, Internet shops, etc.) and provide information at the level of the barcode. One of advantages of using scanner data is the fact that they contain complete transaction information, i.e. prices and quantities for every sold item. To use scanner data, it must be carefully processed. After clearing data and unifying product names, products should be carefully classified (e.g. into COICOP 5 or below), matched, filtered and aggregated. These procedures often require creating new IT or writing custom scripts (R, Python, Mathematica, SAS, others). One of new challenges connected with scanner data is the appropriate choice of the index formula. In this article we present a proposal for the implementation of individual stages of handling scanner data. We also point out potential problems during scanner data processing and their solutions. Finally, we compare a large number of price index methods based on real scanner datasets and we verify their sensitivity on adopted data filtering and aggregating methods.
READ FULL TEXT