Location reference recognition from texts: A survey and comparison

by   Xuke Hu, et al.

A vast amount of location information exists in unstructured texts, such as social media posts, news stories, scientific articles, web pages, travel blogs, and historical archives. Geoparsing refers to the process of recognizing location references from texts and identifying their geospatial representations. While geoparsing can benefit many domains, a summary of the specific applications is still missing. Further, there lacks a comprehensive review and comparison of existing approaches for location reference recognition, which is the first and a core step of geoparsing. To fill these research gaps, this review first summarizes seven typical application domains of geoparsing: geographic information retrieval, disaster management, disease surveillance, traffic management, spatial humanities, tourism management, and crime management. We then review existing approaches for location reference recognition by categorizing these approaches into four groups based on their underlying functional principle: rule-based, gazetteer matching-based, statistical learning-based, and hybrid approaches. Next, we thoroughly evaluate the correctness and computational efficiency of the 27 most widely used approaches for location reference recognition based on 26 public datasets with different types of texts (e.g., social media posts and news stories) containing 39,736 location references across the world. Results from this thorough evaluation can help inform future methodological developments for location reference recognition, and can help guide the selection of proper approaches based on application needs.


page 4

page 23

page 24

page 25


How can voting mechanisms improve the robustness and generalizability of toponym disambiguation?

A vast amount of geographic information exists in natural language texts...

NARMADA: Need and Available Resource Managing Assistant for Disasters and Adversities

Although a lot of research has been done on utilising Online Social Medi...

ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media

Considerable advancements have been made to tackle the misrepresentation...

Lexicon and Rule-based Word Lemmatization Approach for the Somali Language

Lemmatization is a Natural Language Processing (NLP) technique used to n...

Enhancing spatial and textual analysis with EUPEG: an extensible and unified platform for evaluating geoparsers

A rich amount of geographic information exists in unstructured texts, su...

Geo-Text Data and Data-Driven Geospatial Semantics

Many datasets nowadays contain links between geographic locations and na...

Optimization approaches for the design and operation of open-loop shallow geothermal systems

The optimization of open-loop shallow geothermal systems, which includes...

Please sign up or login with your details

Forgot password? Click here to reset