Dynamic statistical inference in massive datastreams

by   Jingshen Wang, et al.

Modern technological advances have expanded the scope of applications requiring analysis of large-scale datastreams that comprise multiple indefinitely long time series. There is an acute need for statistical methodologies that perform online inference and continuously revise the model to reflect the current status of the underlying process. In this manuscript, we propose a dynamic statistical inference framework–named dynamic tracking and screening (DTS)–that is not only able to provide accurate estimates of the underlying parameters in a dynamic statistical model, but also capable of rapidly identifying irregular individual streams whose behavioral patterns deviate from the majority. Concretely, by fully exploiting the sequential feature of datastreams, we develop a robust estimation approach under a framework of varying coefficient model. The procedure naturally accommodates unequally-spaced design points and updates the coefficient estimates as new data arrive without the need to store historical data. A data-driven choice of an optimal smoothing parameter is accordingly proposed. Furthermore, we suggest a new multiple testing procedure tailored to the streaming environment. The resulting DTS scheme is able to adapt time-varying structures appropriately, track changes in the underlying models, and hence maintain high accuracy in detecting time periods during which individual streams exhibit irregular behavior. Moreover, we derive rigorous statistical guarantees of the procedure and investigate its finite-sample performance through simulation studies. We demonstrate the proposed methods through a mobile health example to estimate the timings when subjects' sleep and physical activities have unusual influence upon their mood.


page 1

page 2

page 3

page 4


Testing for long memory in panel random-coefficient AR(1) data

It is well-known that random-coefficient AR(1) process can have long mem...

Estimation and Inference for the Mediation Effect in a Time-varying Mediation Model

Traditional mediation analysis typically examines the relations among an...

Statistical Inference for Streamed Longitudinal Data

Modern longitudinal data, for example from wearable devices, measures bi...

Inference of Grouped Time-Varying Network Vector Autoregression Models

This paper considers statistical inference of time-varying network vecto...

Online estimation methods for irregular autoregressive models

In the last decades, due to the huge technological growth observed, it h...

AdaVol: An Adaptive Recursive Volatility Prediction Method

Quasi-Maximum Likelihood (QML) procedures are theoretically appealing an...

A Sequential Addressing Subsampling Method for Massive Data Analysis under Memory Constraint

The emergence of massive data in recent years brings challenges to autom...

Please sign up or login with your details

Forgot password? Click here to reset