High-resolution home location prediction from tweets using deep learning with dynamic structure
High-resolution prediction of the home location of people has applications in diverse fields, including agriculture, transportation, and public health. The goal here is to obtain an accurate estimate of home locations of a sufficiently large subset of the population to subsequently use in models for the application domain. Conventional data sources, such as census and surveys, have a substantial time-lag and cannot capture seasonal trends. There has been much recent interest in the use of social media data to overcome this limitation. However, this data is usually sparse, noisy and user's home location is just one of several check-in locations. Due to these constraints, much of previous research has aimed at a coarse spatial resolution, such as at the time zone, state, and city levels. This is inadequate for important applications. For example, vector control to prevent epidemics would benefit from 200m resolution. Recent work has used a Support Vector Classifier on Twitter meta-data for such resolution, obtaining 70 test population with 100m resolution. In contrast, we developed a deep learning model for this problem, applying a dynamic structure consisting of a random forest in the first phase and two fully connected deep neural networks in the second phase. We obtained over 90 the large user base for Twitter, this is a sufficiently large subset for use in the modeling applications that we target. We believe that ours is the highest accuracy obtained for high-resolution home location prediction from Twitter data for both the entire sample and for its subsets. The primary contribution of this work lies in developing a deep-learning solution that uses a dynamic structure to deal with sparse and noisy social media data to yield accurate high resolution home locations from Twitter data.
READ FULL TEXT