A Comparison of Document Similarity Algorithms

04/03/2023
by   Nicholas Gahman, et al.
0

Document similarity is an important part of Natural Language Processing and is most commonly used for plagiarism-detection and text summarization. Thus, finding the overall most effective document similarity algorithm could have a major positive impact on the field of Natural Language Processing. This report sets out to examine the numerous document similarity algorithms, and determine which ones are the most useful. It addresses the most effective document similarity algorithm by categorizing them into 3 types of document similarity algorithms: statistical algorithms, neural networks, and corpus/knowledge-based algorithms. The most effective algorithms in each category are also compared in our work using a series of benchmark datasets and evaluations that test every possible area that each algorithm could be used in.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2022

Information Retrieval in Friction Stir Welding of Aluminum Alloys by using Natural Language Processing based Algorithms

Text summarization is a technique for condensing a big piece of text int...
research
01/16/2013

A Rhetorical Analysis Approach to Natural Language Processing

The goal of this research was to find a way to extend the capabilities o...
research
12/03/2021

The Influence of Data Pre-processing and Post-processing on Long Document Summarization

Long document summarization is an important and hard task in the field o...
research
08/06/2018

An Efficient Approach to Learning Chinese Judgment Document Similarity Based on Knowledge Summarization

A previous similar case in common law systems can be used as a reference...
research
03/06/2013

Inference Algorithms for Similarity Networks

We examine two types of similarity networks each based on a distinct not...
research
06/07/2022

An Insight into The Intricacies of Lingual Paraphrasing Pragmatic Discourse on The Purpose of Synonyms

The term "paraphrasing" refers to the process of presenting the sense of...
research
06/21/2021

Extractive approach for text summarisation using graphs

Natural language processing is an important discipline with the aim of u...

Please sign up or login with your details

Forgot password? Click here to reset