Evaluating author name disambiguation for digital libraries: A case of DBLP

06/27/2018
by   Jinseok Kim, et al.
0

Author name ambiguity in a digital library may affect the findings of research that mines authorship data of the library. This study evaluates author name disambiguation in DBLP, a widely used but insufficiently evaluated digital library for its disambiguation performance. In doing so, this study takes a triangulation approach that author name disambiguation for a digital library can be better evaluated when its performance is assessed on multiple labeled datasets with comparison to baselines. Tested on three types of labeled data containing 5,000 to 6M disambiguated names, DBLP is shown to assign author names quite accurately to distinct authors, resulting in pairwise precision, recall, and F1 measures around 0.90 or above overall. DBLP's author name disambiguation performs well even on large ambiguous name blocks but deficiently on distinguishing authors with the same names. When compared to other disambiguation algorithms, DBLP's disambiguation performance is quite competitive, possibly due to its hybrid disambiguation approach combining algorithmic disambiguation and manual error correction. A discussion follows on strengths and weaknesses of labeled datasets used in this study for future efforts to evaluate author name disambiguation on a digital library scale.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2022

A Bayesian Learning, Greedy agglomerative clustering approach and evaluation techniques for Author Name Disambiguation Problem

Author names often suffer from ambiguity owing to the same author appear...
research
03/03/2017

Coverage of Author Identifiers in Web of Science and Scopus

As digital collections of scientific literature are widespread and used ...
research
02/05/2021

Generating automatically labeled data for author name disambiguation: An iterative clustering method

To train algorithms for supervised author name disambiguation, many stud...
research
02/05/2021

ORCID-linked labeled data for evaluating author name disambiguation at scale

How can we evaluate the performance of a disambiguation method implement...
research
02/27/2015

Author Name Disambiguation by Using Deep Neural Network

Author name ambiguity decreases the quality and reliability of informati...
research
04/19/2019

Who wrote this book? A challenge for e-commerce

Modern e-commerce catalogs contain millions of references, associated wi...
research
02/05/2021

Effect of forename string on author name disambiguation

In author name disambiguation, author forenames are used to decide which...

Please sign up or login with your details

Forgot password? Click here to reset