Entity Extraction from Wikipedia List Pages

03/11/2020
by   Nicolas Heist, et al.
0

When it comes to factual knowledge about a wide range of domains, Wikipedia is often the prime source of information on the web. DBpedia and YAGO, as large cross-domain knowledge graphs, encode a subset of that knowledge by creating an entity for each page in Wikipedia, and connecting them through edges. It is well known, however, that Wikipedia-based knowledge graphs are far from complete. Especially, as Wikipedia's policies permit pages about subjects only if they have a certain popularity, such graphs tend to lack information about less well-known entities. Information about these entities is oftentimes available in the encyclopedia, but not represented as an individual page. In this paper, we present a two-phased approach for the extraction of entities from Wikipedia's list pages, which have proven to serve as a valuable source of information. In the first phase, we build a large taxonomy from categories and list pages with DBpedia as a backbone. With distant supervision, we extract training data for the identification of new entities in list pages that we use in the second phase to train a classification model. With this approach we extract over 700k new entities and extend DBpedia with 7.5M new type statements and 3.8M new facts of high precision.

READ FULL TEXT
research
03/30/2017

Automated News Suggestions for Populating Wikipedia Entity Pages

Wikipedia entity pages are a valuable source of information for direct c...
research
10/11/2021

The CaLiGraph Ontology as a Challenge for OWL Reasoners

CaLiGraph is a large-scale cross-domain knowledge graph generated from W...
research
11/04/2015

Transforming Wikipedia into an Ontology-based Information Retrieval Search Engine for Local Experts using a Third-Party Taxonomy

Wikipedia is widely used for finding general information about a wide va...
research
10/04/2022

Transformer-based Subject Entity Detection in Wikipedia Listings

In tasks like question answering or text summarisation, it is essential ...
research
03/20/2019

A Graph-structured Dataset for Wikipedia Research

Wikipedia is a rich and invaluable source of information. Its central pl...
research
04/24/2017

Recognizing Descriptive Wikipedia Categories for Historical Figures

Wikipedia is a useful knowledge source that benefits many applications i...
research
06/11/2018

WikiRef: Wikilinks as a route to recommending appropriate references for scientific Wikipedia pages

The exponential increase in the usage of Wikipedia as a key source of sc...

Please sign up or login with your details

Forgot password? Click here to reset