Active sampling: A machine-learning-assisted framework for finite population inference with optimal subsamples
Data subsampling has become widely recognized as a tool to overcome computational and economic bottlenecks in analyzing massive datasets and measurement-constrained experiments. However, traditional subsampling methods often suffer from the lack of information available at the design stage. We propose an active sampling strategy that iterates between estimation and data collection with optimal subsamples, guided by machine learning predictions on yet unseen data. The method is illustrated on virtual simulation-based safety assessment of advanced driver assistance systems. Substantial performance improvements were observed compared to traditional sampling methods.
READ FULL TEXT