Estimating influenza incidence using search query deceptiveness and generalized ridge regression

01/11/2019
by   Reid Priedhorsky, et al.
0

Seasonal influenza is a sometimes surprisingly impactful disease, causing thousands of deaths per year along with much additional morbidity. Timely knowledge of the outbreak state is valuable for managing an effective response. The current state of the art is to gather this knowledge using in-person patient contact. While accurate, this is time-consuming and expensive. This has motivated inquiry into new approaches using internet activity traces, based on the theory that lay observations of health status lead to informative features in internet data. These approaches risk being deceived by activity traces having a coincidental, rather than informative, relationship to disease incidence; to our knowledge, this risk has not yet been quantitatively explored. We evaluated both simulated and real activity traces of varying deceptiveness for influenza incidence estimation using linear regression. We found that deceptiveness knowledge does reduce error in such estimates, that it may help automatically-selected features perform as well or better than features that require human curation, and that a semantic distance measure derived from the Wikipedia article category tree serves as a useful proxy for deceptiveness. This suggests that disease incidence estimation models should incorporate not only data about how internet features map to incidence but also additional data to estimate feature deceptiveness. By doing so, we may gain one more step along the path to accurate, reliable disease incidence estimation using internet data. This capability would improve public health by decreasing the cost and increasing the timeliness of such estimates.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2017

Deceptiveness of internet data for disease surveillance

Quantifying how many people are or will be sick, and where, is a critica...
research
06/04/2020

Use Internet Search Data to Accurately Track State-Level Influenza Epidemics

For epidemics control and prevention, timely insights of potential hot s...
research
05/01/2019

Disease Identification From Unstructured User Input

The increasing number of Internet users leads to the rapid popularizatio...
research
03/07/2021

Risk Prediction with Imperfect Survival Outcome Information from Electronic Health Records

Readily available proxies for time of disease onset such as time of the ...
research
12/10/2016

Knowledge Elicitation via Sequential Probabilistic Inference for High-Dimensional Prediction

Prediction in a small-sized sample with a large number of covariates, th...
research
04/20/2018

Epidemiological data challenges: planning for a more robust future through data standards

Accessible epidemiological data are of great value for emergency prepare...

Please sign up or login with your details

Forgot password? Click here to reset