Text Similarity from Image Contents using Statistical and Semantic Analysis Techniques

08/24/2023
by   Sagar Kulkarni, et al.
0

Plagiarism detection is one of the most researched areas among the Natural Language Processing(NLP) community. A good plagiarism detection covers all the NLP methods including semantics, named entities, paraphrases etc. and produces detailed plagiarism reports. Detection of Cross Lingual Plagiarism requires deep knowledge of various advanced methods and algorithms to perform effective text similarity checking. Nowadays the plagiarists are also advancing themselves from hiding the identity from being catch in such offense. The plagiarists are bypassed from being detected with techniques like paraphrasing, synonym replacement, mismatching citations, translating one language to another. Image Content Plagiarism Detection (ICPD) has gained importance, utilizing advanced image content processing to identify instances of plagiarism to ensure the integrity of image content. The issue of plagiarism extends beyond textual content, as images such as figures, graphs, and tables also have the potential to be plagiarized. However, image content plagiarism detection remains an unaddressed challenge. Therefore, there is a critical need to develop methods and systems for detecting plagiarism in image content. In this paper, the system has been implemented to detect plagiarism form contents of Images such as Figures, Graphs, Tables etc. Along with statistical algorithms such as Jaccard and Cosine, introducing semantic algorithms such as LSA, BERT, WordNet outperformed in detecting efficient and accurate plagiarism.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/13/2023

PESTS: Persian_English Cross Lingual Corpus for Semantic Textual Similarity

One of the components of natural language processing that has received a...
research
06/10/2021

Analyzing Non-Textual Content Elements to Detect Academic Plagiarism

Identifying academic plagiarism is a pressing problem, among others, for...
research
02/28/2021

NLP-CUET@DravidianLangTech-EACL2021: Offensive Language Detection from Multilingual Code-Mixed Text using Transformers

The increasing accessibility of the internet facilitated social media us...
research
03/15/2022

TSM: Measuring the Enticement of Honeyfiles with Natural Language Processing

Honeyfile deployment is a useful breach detection method in cyber decept...
research
10/30/2020

A Cross-lingual Natural Language Processing Framework for Infodemic Management

The COVID-19 pandemic has put immense pressure on health systems which a...
research
07/03/2022

Multi-aspect Multilingual and Cross-lingual Parliamentary Speech Analysis

Parliamentary and legislative debate transcripts provide an exciting ins...
research
02/24/2023

STA: Self-controlled Text Augmentation for Improving Text Classifications

Despite recent advancements in Machine Learning, many tasks still involv...

Please sign up or login with your details

Forgot password? Click here to reset