Semantic Similarity Measure of Natural Language Text through Machine Learning and a Keyword-Aware Cross-Encoder-Ranking Summarizer – A Case Study Using UCGIS GIS T Body of

05/17/2023
by   Yuanyuan Tian, et al.
0

Initiated by the University Consortium of Geographic Information Science (UCGIS), GIS T Body of Knowledge (BoK) is a community-driven endeavor to define, develop, and document geospatial topics related to geographic information science and technologies (GIS T). In recent years, GIS T BoK has undergone rigorous development in terms of its topic re-organization and content updating, resulting in a new digital version of the project. While the BoK topics provide useful materials for researchers and students to learn about GIS, the semantic relationships among the topics, such as semantic similarity, should also be identified so that a better and automated topic navigation can be achieved. Currently, the related topics are either defined manually by editors or authors, which may result in an incomplete assessment of topic relationship. To address this challenge, our research evaluates the effectiveness of multiple natural language processing (NLP) techniques in extracting semantics from text, including both deep neural networks and traditional machine learning approaches. Besides, a novel text summarization - KACERS (Keyword-Aware Cross-Encoder-Ranking Summarizer) - is proposed to generate a semantic summary of scientific publications. By identifying the semantic linkages among key topics, this work provides guidance for future development and content organization of the GIS T BoK project. It also offers a new perspective on the use of machine learning techniques for analyzing scientific publications, and demonstrate the potential of KACERS summarizer in semantic understanding of long text documents.

READ FULL TEXT
research
05/19/2022

Mapping Complex Technologies via Science-Technology Linkages; The Case of Neuroscience – A transformer based keyword extraction approach

In this paper, we present an efficient deep learning based approach to e...
research
12/19/2022

Graph-based Semantical Extractive Text Analysis

In the past few decades, there has been an explosion in the amount of av...
research
06/08/2018

Automatic Identification of Research Fields in Scientific Papers

The TERRE-ISTEX project aims to identify scientific research dealing wit...
research
08/11/2022

Figure Descriptive Text Extraction using Ontological Representation

Experimental research publications provide figure form resources includi...
research
11/05/2020

Semantic and Relational Spaces in Science of Science: Deep Learning Models for Article Vectorisation

Over the last century, we observe a steady and exponentially growth of s...
research
08/30/2023

Conti Inc.: Understanding the Internal Discussions of a large Ransomware-as-a-Service Operator with Machine Learning

Ransomware-as-a-service (RaaS) is increasing the scale and complexity of...
research
08/09/2023

MetRoBERTa: Leveraging Traditional Customer Relationship Management Data to Develop a Transit-Topic-Aware Language Model

Transit riders' feedback provided in ridership surveys, customer relatio...

Please sign up or login with your details

Forgot password? Click here to reset