Content-Based Quality Estimation for Automatic Subject Indexing of Short Texts under Precision and Recall Constraints

06/07/2018
by   Martin Toepfer, et al.
0

Semantic annotations have to satisfy quality constraints to be useful for digital libraries, which is particularly challenging on large and diverse datasets. Confidence scores of multi-label classification methods typically refer only to the relevance of particular subjects, disregarding indicators of insufficient content representation at the document-level. Therefore, we propose a novel approach that detects documents rather than concepts where quality criteria are met. Our approach uses a deep, multi-layered regression architecture, which comprises a variety of content-based indicators. We evaluated multiple configurations using text collections from law and economics, where the available content is restricted to very short texts. Notably, we demonstrate that the proposed quality estimation technique can determine subsets of the previously unseen data where considerable gains in document-level recall can be achieved, while upholding precision at the same time. Hence, the approach effectively performs a filtering that ensures high data quality standards in operative information retrieval systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2021

NewsEmbed: Modeling News through Pre-trained Document Representations

Effectively modeling text-rich fresh content such as news articles at do...
research
10/23/2017

Content Based Document Recommender using Deep Learning

With the recent advancements in information technology there has been a ...
research
12/07/2020

An Enhanced MeanSum Method For Generating Hotel Multi-Review Summarizations

Multi-document summaritazion is the process of taking multiple texts as ...
research
05/26/2020

Ranking-Incentivized Quality Preserving Content Modification

The Web is a canonical example of a competitive retrieval setting where ...
research
09/13/2019

Modelling Stopping Criteria for Search Results using Poisson Processes

Text retrieval systems often return large sets of documents, particularl...
research
09/24/2020

A Comparative Study of Feature Types for Age-Based Text Classification

The ability to automatically determine the age audience of a novel provi...
research
10/09/2021

Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations

NLP models that compare or consolidate information across multiple docum...

Please sign up or login with your details

Forgot password? Click here to reset